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[Excerpts] 

Abstract 

A  new  neural  network  model,  a  general-purpose  master- 
slave  neural  network  model,  is  presented.  The  general- 
purpose  nature  of  this  model  is  proven  by  two  master- 
slave  control  methods. 

Key  words:  master-slave  neural  network,  energy  func¬ 
tion,  local  minimum,  system  stability. 

I*  Introduction 

Just  as  with  earlier  automatons,  existing  artificial  neural 
networks  are  dedicated  systems  (counting  special  sym¬ 
bols  for  example).  General-purpose  computers  began  to 
emerge  after  the  concept  of  “stored  program”  was  intro¬ 
duced.  The  program  determines  the  sequence  of  infor¬ 
mation  processing.  If  this  concept  is  applied  to  a  neural 
network,  we  will  find  that  its  function  is  determined  by 
the  architecture  of  the  network  (which  determines  the 
processing  sequence  of  network  information)  and  degree 
of  excitation  of  neuron.  To  this  end,  the  key  step  in  the 
design  of  a  general-purpose  neural  network  is  to  build  a 
master  control  module.  Its  output  can  be  used  to  regulate 
the  weighted  value  of  the  slave  module,  or  to  “clamp” 
the  degree  of  excitation  of  the  slave  module.  The  struc¬ 
ture  of  the  main  module  is  “programmed”  to  define  its 
function.  Figure  1  shows  a  simple  master-slave  neural 
network  we  have  designed.  The  master  module  controls 
the  weighted  value  of  the  slave  module  by  way  of  several 
intermediate  channels.  The  topological  architecture  of 
the  slave  module  is  arbitrary.  (The  slave  network  shown 
in  the  figure  has  a  layered  structure.) 

The  general-purpose  master-slave  neural  network  pro¬ 
posed  can  be  controlled  in  two  different  methods.  (1) 
The  output  of  the  master  network  controls  the  degree  of 
excitation  of  slave  neural  network  elements  in  the  form 
of  an  external  current.  (2)  The  master  network  output 
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Figure  1.  A  Simple  Master-Slave  Neural  Network 


regulates  the  link  right  of  the  slave  network.  In  reality, 
(1)  and  (2)  may  be  combined  in  use. 

Specifically  with  reference  to  these  two  basic  methods, 
two  master-slave  models  are  designed  and  simulated  on 
a  computer. 

II.  General-Purpose  Master-Slave  Model  With  Link 
Weight  Regulation 

A  BP  network  is  essentially  a  nonlinear  mapping 
function:1 


Y=Fl(W1 .  F,(Wi . *))  (1) 

where  X  and  Y  are  the  input  and  output  vector  of  the 
network,  and  Wj  is  the  link  weight  matrix  between  the  jth 
and  j  h-  1th  layer. 

The  shortcoming  of  the  present  BP  [backpropagation] 
network  is  its  slow  learning  process.  It  is  difficult  to  use 
hardware  to  reflect  the  change  in  the  weight.  Further¬ 
more,  the  learning  algorithm  essentially  is  a  nonlinear 
optimization  process.  There  is  not  a  good  method  for 
overcoming  difficulties  associated  with  local  minima. 
Specifically  in  response  to  these  problems,  the  network 
shown  in  Figure  2  is  introduced.  A  Hopfield  network 
with  m  x  n  neurons  is  used  to  replace  the  weight  matrix 
Wm  x  n  for  every  layer  of  the  BP  network.  The  output  of 
each  neuron  represents  weight  Wi3  of  the  original  BP 
network.  Since  the  nature  of  a  Hopfield  network  is  that 
the  energy  function  of  the  network  is  minimized  when  it 
converges,2  it  is  possible  to  utilize  this  behavior  of  the 
Hopfield  network  to  help  converge  the  BP  network  if  the 
error  function  of  the  BP  network  is  treated  as  the  energy 
function  of  the  Hopfield  network,  [passage  omitted] 
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Figure  2.  General-Purpose  Master-Slave  Neural  Net¬ 
work  With  Tunable  Weighted  Interconnection 


A  control  layer  is  added  on  top  of  the  Hopfield  layer.  A 
bias  current  is  injected  to  cause  the  error  function  of  the 
BP  network  to  jump  at  the  local  minimum.  This  tech¬ 
nique  is  used  in  the  other  model  discussed  in  this  paper. 

By  now,  the  following  conclusions  can  be  reached  for  this 
model: 

(1)  From  equation  (3),  regardless  of  whether  it  is  a  BP 
network  or  Hopfield  network,  the  weight  is  expressed  in 
terms  of  the  output  state  of  the  neuron.  Thus,  the  model 
is  not  relevant  to  the  actual  problem  and  the  network  is 
more  general-purpose  in  nature.  It  is  relatively  easy  to 
realize  a  variety  of  nonlinear  mappings. 

(2)  The  model  can  easily  be  constructed  in  hardware 
because  weight  link  matrices  can  be  realized  by  means  of 
multipliers.  Compared  to  using  resistors  or  RAM,  it  is 
more  flexible  and  accurate. 

(3)  The  learning  process  of  the  BP  network  is  automati¬ 
cally  completed  under  the  control  of  the  Hopfield  layer. 

This  model  is  used  to  solve  the  “mirror  image  symme¬ 
try”  recognition  problem  presented  by  D.  E. 
Rumelhart.  When  an  n-dimensional  vector  comprised 
of  n  symbols  is  applied  to  the  input  end  of  the  network, 
the  output  should  point  out  whether  this  string  is  sym¬ 
metric  left  to  right  (for  example,  00111100  is  a  sym¬ 
metric  string).  This  is  a  classic  problem  for  testing  the  BP 
algorithm.  It  imposes  a  high  degree  of  stability  on  the  BP 
network.  The  data  provided  by  Rumelhart  show  that  the 
network  needed  to  be  trained  approximately  90,000 
times  before  converging  when  n  *  6.  In  our  model,  the 
nonlinear  mapping  function  of  the  Hopfield  network  is 
chosen  to  be  f(u)  =1/(1+  exp  (-a  u)).  When  0.2  <  a  <  2, 
0.1  <  e  <  0.8  (where  e  is  the  iteration  step  given  in 
equation  (4)),  the  network  converges  after  32,000-60,000 
times  of  training.  Figure  3  shows  a  and  e  as  a  function  of 
number  of  times  to  reach  convergence.  Apparently,  this 
number  is  related  to  the  product  of  a  and  e;  when  the 
product  of  a  and  e  is  a  specific  value,  the  number 
required  to  converge  is  minimized.  The  farther  it  is  off 


from  this  value,  the  larger  the  number  of  times  required 
to  converge;  or,  it  may  not  converge  at  all. 
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Figure  3.  a  and  e  Curve 


III.  General-Purpose  Master-Slave  Model  Using 
External  Current  Control 

In  order  to  solve  the  NP  problem  in  conventional 
optimization,  i.e.,  the  traveling  salesman  problem  (TSP), 
a  master-slave  neural  network  such  as  the  one  shown  in 
Figure  4  has  been  designed.  The  slave  network  is  a 
Hopfield  network  and  the  master  network  is  a  conven¬ 
tional  computer.  Based  on  the  link  matrix,  an  effective 
Lyapunov  function  has  been  designed.  The  convergence 
of  the  new  model  and  the  validity  of  the  solution  are 
verified  by  way  of  analysis  of  the  Eigen  value  of  the 
network  link  matrix  and  the  network  dynamic  equation, 
[passage  omitted] 
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Figure  4.  Master-Slave  Model  Using  External  Current 
Control 


IV.  Conclusions 

The  master-slave  models  presented  in  this  paper  not  only 
are  capable  of  delineating  the  characteristics  of  the 
complicated  biologic  neural  system  (master-slave 
differentiation)8  but  also  are  to  some  extent  general- 
purpose  in  nature  by  stressing  the  use  of  a  conventional 
computer.  Master-slave  differentiation  indicates  that 
information  is  processed  separately  based  on  its  rele¬ 
vance.  This  will  reduce  the  communications  load  and 
makes  it  feasible  to  build  a  coarse  artificial  neural 
network  processor. 
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[Text] 


Abstract 

A  floating-gate  NMOS  transistor  neural  network  has 
been  designed.  Its  features  include  simple  structure, 
small  chip  area  and  continuously  adjustable  interconnec¬ 
tion  weight.  In  addition,  it  possesses  the  characteristics 
of  distributed  neurons  and  can  be  cascaded  to  form  a 
large  network.  An  8  x  8  interconnected  neuro-chip  has 
been  fabricated  using  a  3  p  m  floating-gate  NMOS 
process.  The  chip  contains  128  programmable  floating- 
gate  NMOS  transistors,  which  is  equivalent  to  a  fully 
interconnected  network  comprised  of  8  neurons.  It  has 
been  used  in  number  recognition  and  binary  image 
processing.  The  results  show  that  the  network  has  a  great 
deal  of  potential.  Furthermore,  it  is  highly  flexible  and 
easy  to  fabricate  because  of  its  simple  structure  and 
adaptability  to  IC  technology. 

Key  words:  artificial  neural  network,  IC,  circuits  prin¬ 
ciple  and  design. 

I.  Introduction 

Theoretical  analysis  of  neural  networks  is  still  being 
conducted.  Research  on  neural  networks  must  still  rely 
on  analog  tools.  Hardware  support  is  required  in  solving 
complex  problems  involving  large-scale  parallel  systems. 
Hence,  a  lot  of  interest  is  focused  on  the  implementation 
of  neural  networks.  A  large  number  of  electronic 
methods  are  available  to  implement  neural  networks.  In 
the  area  of  VLSI,  there  are  digital  circuits,1  analog 
circuits,2  digital/analog  circuits,3,  voltage  circuits,  cur¬ 
rent  circuits,4  pulse  current  circuits,3  etc.  The  major 
feature  of  a  neural  network  is  the  formation  of  a  large 
nonlinear  dynamic  system  through  the  interconnection 
of  a  large  number  of  simple  elements.  When  a  network  is 
implemented  for  a  parallel  system,  the  element  ought  to 
be  simple  in  structure  and  the  chip  should  be  small  in 
order  to  form  a  large-scale  network.  The  high  computing 
power  and  flexibility  of  a  neural  network  comes  mainly 
by  way  of  adjusting  the  synaptic  weights.  It  is  imperative 
to  be  able  to  fabricate  neural  networks  that  have  contin¬ 
uously  adjustable  weights. 

The  basic  element  of  a  neural  network  not  only  is  an 
operator  but  also  a  storage  device.  Its  complexity  and 
area  are  determined  by  both  operation  and  storage 
requirements.  As  far  as  operation  is  concerned,  most 
conventional  methods  are  based  on  theoretical  algo¬ 
rithms  and  do  not  fully  utilize  the  physical  characteris¬ 
tics  of  the  device.  It  is  very  difficult  to  arrive  at  a  simple 
design.  Active  devices  in  a  digital  circuit  behave  as 
switches.  A  device  operates  in  the  saturation  region  and 
cutoff  region  only  and  its  characteristics  are  poorly 
utilized.  When  the  computation  is  complete,  the  circuit 
is  complicated,  the  chip  area  is  large  and  the  computa¬ 
tion  time  is  long.  Nevertheless,  this  kind  of  circuit  has  a 
high  level  of  tolerance  to  noise  and  is  the  most  mature 
circuit  in  use.  In  an  analog  circuit,  an  active  device 
operates  in  a  linear  region.  The  physical  laws  in  the 
linear  region  may  be  utilized  in  performing  the  compu¬ 
tations.  Compared  to  digital  circuitry,  its  utilization  rate 
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is  higher,  the  chip  area  is  smaller  and  the  computing 
speed  is  faster.  However,  the  circuit  is  more  susceptible 
to  noise.  Even  though  the  accuracy  of  each  individual 
device  is  sufficiently  high,  due  to  the  accumulation  of 
errors  the  final  result  is  severely  impacted  by  noise  when 
the  entire  computation  is  completed.  Therefore,  it  is 
difficult  to  make  a  large-scale  computation  system  with 
this  kind  of  circuitry.  Since  the  correctness  of  a  neural 
network  not  only  depends  on  the  accuracy  of  each 
individual  device  but  also  on  the  feedback  of  the  net¬ 
work,  it  does  not  impose  absolutely  precise  device 
parameters,  rigorously  matched  devices  and  accurate 
time  constants.  We  should  be  able  to  take  more  advan¬ 
tage  of  the  physical  characteristics  of  the  device  com¬ 
pared  to  an  analog  circuit  (i.e.,  letting  the  transistor 
operate  over  its  entire  characteristic  region)  so  that  the 
basic  computation  element  has  the  simplest  structure 
and  occupies  the  least  amount  of  area.  The  storage  of 
analog  data  at  the  basic  element  level  is  one  of  the  key 


issues  in  the  implementation  of  neural  networks.  Its 
behavior  also  determines  the  performance  of  the  entire 
network.  The  floating-gate  structure  is  a  simple  and 
low-chip-area  analog  storage  device.  It  consumes  little 
power  for  erasing  and  writing.  The  charge  leakage  of 
floating-gate  storage  is  also  low,  which  is  especially 
suitable  for  the  storage  of  synaptic  weights. 

A  simple  and  highly  flexible  neuro-chip  has  been 
designed  and  fabricated  based  on  3  p  m  floating-gate 
NMOS  process.  It  employs  the  floating-gate  NMOS 
device  to  perform  the  computation  and  storage  function 
of  the  basic  element.  This  not  only  greatly  simplifies  the 
programmable  network  but  also  makes  it  adaptable  to  IC 
technology.  A  model  was  built  for  number  recognition 
and  it  is  used  as  an  example  to  demonstrate  its  applica¬ 
tion  as  a  Hamming  network.  It  was  also  used  in  noise 
cancellation  and  edge  detection  of  binary  images  to 
illustrate  the  fact  that  the  chip  might  be  used  in  variable 
threshold  circuits. 


II.  Design  and  Implementation  of  Neuro-Chip 

Figure  1  shows  the  neuron  of  a  network  comprised  of  floating-gate  NMOSFETs.  Since  all  neurons  are  identical  in 
structure  in  this  network,  they  essentially  have  the  same  conductivity  factor  and  parasitic  capacitance.  Hence,  the 
interconnect  strength  can  only  be  adjusted  by  threshold  voltage.  The  dynamic  equation  describing  this  kind  of  a 
nonlinear  network  is  as  follows: 


d£/#  _ 
”dt  K 

Fu(.Uj,U 


r  .v  n+m  T 

[jfj  F,,<UifUi,  VTlf')  +  l'^JlFn(X}.s,Ui,VT{l')\ 

_(f(y,-Ui,VDD-U{,VTii')  excited  interconnect 

inhibited  interconnect 


t=l,  Nj  7  =  1  t  •••»  N  +  M 


where  K  is  the  ratio  of  transistor  conductivity  factor  to 
neuron  input  capacitance,  VDD  is  the  power  supply 
voltage,  N  is  the  number  of  neurons,  M  is  the  external 
input  number  into  the  network,  and  f  is  the  characteristic 
function  of  the  floating-gate  transistor.  The  first  term 
inside  the  bracket  on  the  right  hand  side  of  the  above 
equation  represents  the  network  feedback  and  the 
second  term  represents  the  feed-forward  part  of  the 
network.  This  neural  network  has  been  constructed 
based  on  the  nonlinear  characteristics  of  the  transistor. 

Figure  2  shows  the  circuit  of  the  floating-gate  NMOS 
transistor  neural  network  based  on  such  a  neuron  struc¬ 
ture.  It  is  comprised  of  three  parts.  The  first  part  is  the 
basic  network  which  contains  8x8  basic  interconnected 
elements.  The  second  part  controls  the  operation  that 
includes  Mcl.g,  Mgl_8  and  Mn,8.  Mcl_8  control  the  inter¬ 
connect  states.  When  cp  c  is  at  a  high  voltage,  the  network 
operates  in  a  feedback  mode  and  when  <p  c  is  at  a  low 
voltage,  the  network  operates  in  a  non-feedback  mode. 


1 V, 

Figure  1.  Floating-Gate  NMOS  Neuron  Structure 


Mg|.g  clear  the  neuron  dendrite  lines  (output).  Mn_8 
control  the  signal  input.  In  addition,  their  presence 
avoids  the  direct  connection  of  the  grid  of  the  neuron  to 
the  input  leg  to  protect  the  programming  in  the  network 
when  (p  i  is  low.  The  third  part  is  a  program  control 
circuit  that  is  comprised  of  Mol_8  and  Me,_8.  They  switch 
to  control  the  program  to  choose  between  excited  or 
inhibited  interconnect.  When  (pQ  is  high  and  cpe  is  low, 
the  program  is  excitedly  interconnected.  When  <p0  is  low 
and  cpe  is  high,  the  interconnect  is  inhibited. 
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This  NMOS  transistor  neural  network  has  the  feature  of 
a  distributed  neuron  structure.  One  disadvantage  of  the 
conventional  resistor  amplifier  neural  network  is  that  the 
fan-in  and  fan-out  of  the  amplifier  must  be  increased  as 
the  network  expands.  Furthermore,  its  ability  to  amplify 
the  lowest  signal  cannot  be  reduced.  This  makes  the 
design  of  the  amplifier  difficult.  Hence,  a  distributed 
neuron  network  structure6  was  proposed.  The  number  of 
amplifiers  is  increased  in  order  to  reduce  the  need  to 
enhance  the  fan-in  and  fan-out  capability  of  each  ampli¬ 
fier.  From  a  certain  angle,  each  transistor  in  a  transistor 
neural  network  accomplishes  the  function  of  a  resistor 
and  an  amplifier  in  a  distributed  neuron  network.  Since 
this  network  possesses  the  features  of  a  distributed 
neuron  network,  it  may  be  conveniently  cascaded  into  a 
larger  network,  or  a  network  of  a  different  structure. 
Thus,  the  adaptability  and  flexibility  of  the  hardware  are 
improved.  In  addition,  this  also  facilitates  the  design  of 
the  neural  network  on  the  circuit  board. 

In  order  to  shorten  the  leads,  the  output  lines  are 
designed  to  be  horizontal  and  in  the  chip  layout.  The 
input  lines  are  perpendicular  to  the  output  lines.  Each 
basic  element  in  the  network  consists  of  two  floating-gate 
NMOSFETs.  One  is  the  excited  interconnect  between 
the  programming  electrode  and  the  source  and  the  other 
is  the  inhibited  interconnect  that  has  an  independently 
programmed  electrode.  In  order  to  have  a  symmetric 
excited  and  inhibited  interconnect,  to  the  extent  pos¬ 
sible,  both  transistors  have  the  same  master  pattern. 

The  circuit  on  the  3  \x  m  floating-gate  NMOS  chip  is  2.5 
x  3.3  mm2  in  area.  It  includes  128  programmable 
NMOSFETs  and  over  50  peripheral  transistors,  as 
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shown  in  Figure  3  [photograph  not  reproduced].  Results 
measured  show  that  the  breakdown  voltage  of  transistors 
in  the  peripheral  circuit  ensure  that  the  programmed 
voltage  can  be  applied  to  the  basic  element.  The  pro¬ 
gramming  characteristics  of  the  basic  interconnect  ele¬ 
ment  is  described  in  reference  7. 


III.  Hamming  Network  Constructed  With  Neural 
Network  Chips 

The  Hamming  network  is  a  double-layer  network  com¬ 
prised  of  a  maximum  network  and  a  template-matching 
network.  It  can  compare  an  input  vector  with  existing 
template  vectors  and  determine  the  matching  template 
with  the  minimum  Hamming  distance.  This  is  a 
common  operation  in  pattern  recognition. 

A  beneficial  result  of  biologic  neural  network  research  is 
the  discovery  of  a  side  inhibiting  effect.  One  of  the  basic 
modes  of  the  neural  network  established  based  on  this 
principle  is  a  competitive  network.  The  survival- 
of-the-fittest  network8  is  a  classic  competitive  network. 
Its  function  is  to  locate  the  maximum  among  a  set  of 
input  data  by  way  of  network  operation.  The  neuron 
corresponding  to  the  maximum  input  has  the  maximum 
output  by  means  of  competitive  evolution  while  other 
output  values  become  the  minimum.  When  imple¬ 
menting  this  function  with  a  chip  circuit,  the  network  is 
operating  in  a  full-feedback  mode.  The  interconnects 
between  a  certain  neuron  and  other  neurons  are  inhib¬ 
ited,  while  the  connection  to  itself  is  excited.  In  a 
transistor  neural  network,  an  interconnecting  transistor 
not  only  adjusts  the  weight  but  also  amplifies.  The 
relative  control  of  the  output  current  by  each  transistor 
reflects  the  strength  of  the  interconnect  matrix.  The 
absolute  control  of  the  output  current  is  a  function  of  the 
electrical  properties  (such  as  high-voltage  or  low-voltage 
output,  time  delay,  etc.)  of  the  whole  network. 

Another  important  application  of  the  neural  network  is 
to  compare  an  input  vector  to  template  vectors  already 
stored  in  the  network  to  calculate  the  degree  of  matching 
between  the  input  vector  and  the  template  vector.  In 
principle,  this  computation  can  be  accomplished  by 
three  different  interconnect  methods.5  As  for  the  sur- 
vival-of-the-fittest  maximum  network  constructed  by  the 
chips  described  earlier,  it  is  more  appropriate  to  employ 
inhibitively  interconnected  elements  to  Store  the  tem¬ 
plate-matching  network. 

By  connecting  a  survival-of-the-fittest  network  to  a 
matching  network,  we  have  a  Hamming  network.  The 
matching  template  for  numbers  1-8  is  as  shown  in  Figure 
4.  Since  the  template  is  a  15-dimensional  vector,  it 
requires  two  8x8  network  chips  to  complete  the 
computation  for  template  similarity.  Furthermore,  an  8 
x  8  network  chip  is  required  to  find  the  maximum  for 
eight  matching  templates.  Therefore,  the  Hamming  net¬ 
work  for  number  recognition  requires  three  chips.  Figure 
5  shows  the  circuit  for  the  entire  system.  By  means  of 
weighted  interconnection  through  programming,  the 
three  chips  are  weight-adjusted  to  be  connected  in  the 
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manner  shown  in  the  circuit.  The  figure  does  not  show 
any  interconnects  that  correspond  to  the  case  where  the 
threshold  voltage  is  higher  than  the  source  voltage 
because  they  are  not  functional  in  the  circuit.  The 
weight-adjusted  circuit  was  linked  to  a  computer  to 
conduct  number  recognition  tests.  Figure  6  shows  the 
results  of  a  series  of  experiments.  For  every  pair  of 
numbers  in  the  figure,  the  left  one  represents  the  input 
vector  and  the  right  one  is  the  final  recognition  result. 


IiZ345fiT$ 


Figure  4,  Matching  Templates  for 
Numbers  1-8 


Figure  5.  Circuit  of  Hamming  Network 
for  Recognition  of  Numbers 
1-8 


Figure  6.  Experimental  Result  of 
Number  Recognition 


IV.  Variable-Threshold  Logic  Network  Constructed 
With  Chip  Circuit 

The  threshold  logic  concept  was  presented  by  McCulloch 
and  Pitts9  to  describe  neuron  mathematical  models  and 
Boolean  functions.  Variable-threshold  logic  is  threshold 


logic  with  variable  interconnect  weight  and  a  variable 
threshold.  The  threshold  element  is  a  logic  element  with 
a  binary  input  and  an  output  value  of  either  “0”  or  “1.” 
Its  property  is  determined  by  the  weight  and  the 
threshold.  A  single-threshold  element  can  only  be  used  to 
implement  a  linear  separable  function.  A  network  com¬ 
prised  of  threshold  elements  can  be  used  to  implement 
any  arbitrary  logic  function. 

The  transistor  neuro-chip  can  conveniently  be  used  as  a 
variable-threshold  element.  Its  circuit  is  shown  in  Figure 
7.  The  threshold  voltage  of  inhibitively  interconnected 
transistors  determines  the  weight  and  the 

threshold  voltage  of  transistor  MT  determines  the 
threshold  of  the  element.  Mcl  rMc22  perform  operations 
such  as  weighted  input  and  threshold  comparison.  The 
output  of  this  threshold  element  is  complementary. 


Figure  7. 

Transistor  Neuro-Chip  as  a  Threshold  Element 


As  an  example,  a  variable-threshold  logic  network  has 
been  used  to  eliminate  noise  from  a  binary  image.  A  3  x 
3  pixel  window  is  moved  sequentially  across  a  binary 
image.  The  number  of  black  pixels  in  the  window  at  each 
position  is  counted.  Corresponding  to  the  new  image,  the 
output  is  black  only  when  the  number  of  counts  of  the 
window  is  greater  than  or  equal  to  the  threshold.  For 
such  a  noise-cancellation  method,  it  is  simpler  to  accom¬ 
plish  with  a  logic  circuit.  Because  the  logic  function  in 
this  method  is  linear  and  separable,  it  can  be  easily 
implemented  with  a  threshold  logic  circuit. 

Figure  8  shows  the  results  of  the  actual  treatment.  Figure 
(a)  shows  the  noisy  input  binary  image;  (b),  (c)  and  (d) 
are  the  processed  results  as  a  function  of  decreasing 
threshold.  Because  the  internal  operation  of  the  circuit- 
operates  in  analog  mode,  the  interconnect  strength  can 
be  programmed  to  be  continuously  tunable.  Hence,  the 
threshold  may  be  determined  based  on  the  situation  of 
the  circuit  and  the  degree  of  satisfaction  with  regard  to 
the  treatment.  This  is  quite  different  from  pure  digital 
operations. 

As  another  example,  a  variable-threshold  network 
capable  of  processing  an  inseparable  logic  function  has 
been  used  to  detect  the  edge  of  a  binary  image.  Experi¬ 
mentally,  the  Laplace  operator  of  four  neighboring  win¬ 
dows  is  used  detect  the  edge  of  a  binary  image.  In  this 
method,  when  five  pixels  are  identical,  the  output  is  0; 
otherwise,  it  is  1.  This  is  a  linear  inseparable  logic 
function  and  cannot  be  completed  by  a  single-threshold 
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Figure  8.  Experimental  Result  of  Noise  Cancellation  of 
Binary  Images 


element.  Despite  the  fact  that  this  function  is  linear  and 
inseparable  in  5-dimensional  space,  it  becomes  a  linear, 


separable  logic  function  in  a  higher-dimensional  space. 
Let  X6  *  X  t  +  X2  +  X3  +  X4  +  X5  and  use  X6  as  the  sixth 
input  for  the  threshold  element.  The  logic  function  by 
now  becomes  linear  and  separable  and  can  be  dealt  with 
by  a  single-threshold  element.  Figure  9  shows  the  circuit 
used  for  edge  detection  using  this  method.  Theoretically, 
this  algorithm  can  also  be  accomplished  by  a  variable- 
threshold  logic  network,  however,  this  requires  more 
neuro-chips  and  is  therefore  not  used.  Figure  10  shows 
the  network  treatment  results,  where  (a)  is  the  34  x  34 
pixel  input  binary  image,  and  (b),  (c)  and  (d)  are  the 
results  obtained  using  different  threshold  values.  From 
this  edge  detection  example,  transistor  neural  networks 
not  only  can  be  used  to  implement  variable-threshold 
logic  circuits  but  also  common  logic  gates.  Therefore,  the 
circuit  used  to  implement  a  logic  function  must  be 
thoroughly  considered  in  order  to  select  the  simplest  one. 
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Figure  9.  Edge  Detection  Circuit  Using 
Four-Neighboring-Window 
Laplace  Operator _ 


Figure  10.  Experimental  Results 
of  Binary  Image  Edge 
Detection 


V.  Conclusions 

A  simple  neuro-chip  with  variable  interconnect  weight 
has  been  designed  and  fabricated  using  3pm  floating- 
gate  NMOS  technology.  This  network  behaves  like  a 
distributed  neuron  structure  and  can  be  cascaded  into  a 
large-scale  network.  The  main  part  of  the  circuit  is  an  8 
x  8  fully  interconnected  matrix,  equivalent  to  an  8- 
neuron  network.  Weight  storage  is  done  by  the  floating- 
gate  structure. 

By  connecting  several  floating-gate  NMOS  networks,  a 
Hamming  network  was  formed  for  numeral  recognition 
and  a  variable-threshold  logic  network  was  created  for 
noise  cancellation  of  binary  images.  A  binary  and  vari¬ 
able-threshold  hybrid  logic  circuit  was  also  formed  to 
detect  edges  of  binary  images.  All  results  are  satisfactory. 
In  these  applications,  the  interconnect  weights  were 
electrically  written  and  survived  numerous  write/erase 


cycles.  This  proves  that  the  chip  has  a  promising  future 
and  exhibits  a  high  level  of  flexibility  in  various  appli¬ 
cations. 
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Abstract 

The  optimum-learning-rate  backpropagation  algorithm 
is  reviewed.  The  equations  for  computing  the  optimum 
learning  rates  of  several  commonly  used  networks  are 
presented.  Problems  associated  with  the  implementation 
of  the  algorithm  are  discussed.  The  speed  of  the  algo¬ 
rithm  is  further  illustrated  by  experimental  simulations. 

Key  words:  optimum  learning  rate,  BP  algorithm,  mul¬ 
tilayer  neural  network,  prediction. 

I.  Introduction 

In  theory,  it  has  been  demonstrated  that  a  layered 
feed-forward  neural  network  can  be  employed  to  approx¬ 
imate  any  given  continuous  mapping  f:  RN  — ►  Ru  by 
properly  selecting  the  topological  architecture  and  the 
interconnect  weight.  This  type  of  neural  network  plays 
an  important  role  in  artificial  neural  network  research, 
especially  in  the  recognition  and  classification  of  sensory 
signals  such  as  voice  and  image,  as  well  as  in  nonlinear 
signal  processing  applications  such  as  adaptive  filtering, 
adaptive  control,  and  mapping  approximation.  This  type 
of  network  model  is  often  widely  used  as  a  critical  part  of 
a  system  or  the  entire  system  itself.  The  mapping  created 
by  such  a  network  often  becomes  an  important  factor 
affecting  the  performance  of  the  system.  Nevertheless, 
we  have  not  found  an  effective  algorithm  to  automati¬ 
cally  design  the  topological  architecture  and  intercon¬ 
nect  weight  of  a  multilayer  feed-forward  network  that 
provides  a  fixed  mapping  relation.  In  practice,  it  is 
converted  to  a  two-stage  experimental  design  optimiza¬ 
tion  problem.  This  involves  the  selection  of  an  architec¬ 
ture  and  using  a  learning  algorithm  to  train  the  weight 
value  to  make  the  target  function  of  the  network  reach  an 
optimum  or  satisfactory  level  for  this  particular  archi¬ 
tecture.  Then,  the  network  architecture  is  changed  and 
the  training  process  of  the  interconnect  weight  is 
repeated.  Finally,  all  the  results  are  compared  to  choose 
the  best  combination  of  architecture  and  weight  pro¬ 
viding  the  optimum  target  function.  The  total  time 
required  to  complete  such  a  two-stage  design  process  is 
equal  to  the  number  of  network  architecture  chosen 
multiplied  by  the  average  time  required  to  obtain  the 
optimum  or  a  satisfactory  weight  value  for  each  archi¬ 
tecture.  Therefore,  with  a  given  network  architecture  and 
initial  weight,  it  is  of  practical  significance  to  reduce  the 
learning  time  required  to  find  the  weight  that  provides  a 
satisfactory  target  function  value. 

In  practice,  backpropagation  (BP)1  is  the  most  popular 
algorithm  for  learning  the  weight  of  a  multilayer  feed¬ 
forward  neural  network.  However,  conventional  BP  is 
seriously  impacted  by  the  arbitrary  constancy  of  the 
parameter  selected.  With  a  given  network  architecture 
and  initial  interconnect  weight,  one  often  needs  to  select 
different  parameters  for  the  algorithm  and  repeat  the 
learning  process  to  obtain  a  satisfactory  solution.  This 
not  only  significantly  slows  down  the  convergence  rate  of 
the  design  process  but  also  diverts  focus  away  from 
system  architecture  and  functionality. 
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Specifically  with  reference  to  the  difficulty  caused  by  the 
arbitrary  constancy  in  setting  the  parameter  for  back- 
propagation,  a  method  to  analyze  the  learning  rate  of  an 
adaptive  and  improved  BP  algorithm  is  presented  in 
reference  2.  This  method  is  formalized  in  reference  3  and 
general  formulas  are  also  provided.  It  essentially  gener¬ 
ates  a  second-order  characteristic  curve  of  the  parameter 
(i.e.,  learning  rate  p  )  along  the  direction  of  fastest 
decline  of  the  target  function  by  local  linearization  of  the 
nonlinear  processing  elements  in  the  network  and  this 
curve  is  used  to  locally  approximate  the  target  function. 
The  second-order  characteristic  curve  is  completely 
determined  by  the  current  iteration  gradient,  the  error  of 
the  processing  element  in  the  output  layer  and  its  per¬ 
turbation.  Its  shape  varies  as  the  curvature  of  the  target 
function  at  the  point  of  interest  changes.  In  every  itera¬ 
tion,  the  optimum  learning  rate  p  *  always  shifts  the 
weight  toward  the  minimum  of  the  characteristic  curve 
generated  in  the  same  iteration.  The  purpose  of  having 
the  learning  rate  adapt  according  to  the  curvature  of  the 
target  function  of  the  network  is  to  overcome  the  diffi¬ 
culty  associated  with  the  “tuning  of  the  parameter  of  the 
algorithm,”  to  improve  the  degree  of  the  approximation 
of  the  mapping  created  by  the  network,  and  to  speed  up 
the  network  learning  rate.  This  paper  describes  this 
algorithm  and  reviews  its  results.  Problems  associated 
with  the  implementation  of  the  algorithm  are  also  dis¬ 
cussed.  Specific  formulas  to  calculate  the  optimum 
learning  rates  in  frequently  used  networks  are  provided. 
The  fast  nature  of  the  algorithm  is  also  illustrated  by  way 
of  simulation.  (The  derivation  and  proof  of  the  algo¬ 
rithm  is  presented  in  reference  4.)  [passage  omitted] 

IV.  Computer  Simulation  Experiment 

The  objective  of  the  experiment  is  to  observe  any 
improvement  of  the  optimum-learning-rate  BP  algo¬ 
rithm  over  the  conventional  BP  algorithm,  in  terms  of 
learning  speed  and  the  ability  of  the  network  to  create  an 
approximation  of  the  mapping  desired.  Furthermore,  the 
effect  of  the  initial  weight  on  the  results  is  also  investi¬ 
gated.  In  addition,  a  comparison  of  the  cost-to-benefit 
relation  of  computation  is  also  made. 

The  objective  of  the  experiment  is  to  employ  a  single- 
hidden-layer  feed-forward  neural  network  to  learn  the 
Feigenbaum  mapping  produced  by  the  following  non¬ 
linear  iteration: 

x(n+l)  *  rx(n)[l-x(n)],  n  -  0,  1,  2...  (22) 

where  the  initial  value  0  <  x(0)  <  1,  and  r  is  a  control 
parameter.  In  our  experiment,  let  x(0)  -  0.0100004  and 
r  -  4.0.  Equation  (22)  generates  a  general  mixed-time 
sequence  by  way  of  iteration. 

The  purpose  of  using  a  layered  feed-forward  neural 
network  to  learn  Feigenbaum  mapping  is  to  allow  the 
weight  of  the  network  to  begin  learning  from  a  random 
initial  value.  Eventually,  it  enables  the  network  to  pro¬ 
duce  an  output  x  [caret  over  x]  (n  +  1)  in  response  to  an 
input  x(n)  in  accordance  with  equation  (22),  i.e.,  abso¬ 
lute  value  of:  x  [caret  over  x]  (n  +  l)-x(n  +  l)2  is  held  to 


a  minimum.  This  can  also  be  viewed  as  a  prediction  of 
the  time  sequence  generated  by  equation  (22)  by  the 
layered  feed-forward  neural  network.  In  the  learning 
process,  each  experimental  data  block  length  is  T  -*  1 6, 
and  each  block  renews  an  input  and  ideal  output  pair. 

The  architecture  of  the  network  is  chosen  arbitrarily.  In 
our  experiment,  we  selected  a  hidden  processing  element 
comprised  of  six  Sigmoid  nonlinear  response  functions, 
a  linear  input  processing  element,  a  linear  output  pro¬ 
cessing  element,  and  a  non-zero  threshold  single- 
hidden-layer  feed-forward  network.  The  computation 
load  for  each  iteration  is  shown  in  Table  1. 


Table  1.  Computations  Required  for  Each  Iteration  for  the 


(1  6  1)  Network  With  T  =  16 


Conventional  BP 

Optimum-learning- 

algorithm 

rate  BP  algorithm 

Multiplication/division 

853 

1302 

Addition/subtraction 

512 

830 

Total 

1365 

2132 

The  experiment  begins  with  two  sets  of  initial  weights. 
For  each  initial  weight,  let  the  conventional  BP  learning 
rates  be  0.005,  0.01,  0.05  and  0.1  and  initiate  four 
corresponding  learning  processes.  A  total  of  eight 
learning  processes  are  done  for  the  two  initial  weights. 
For  each  learning  process,  the  relative  rms  error  of  the 
network  output  (i.e.,  the  ratio  of  the  output  error  matrix 
mode  to  the  ideal  output  matrix  mode)  is  recorded  as  a 
function  of  the  number  of  iterations  involved.  The 
results  corresponding  to  two  sets  of  initial  weights  are 
plotted  in  Figures  3(a)  and  3(b),  respectively.  The  two 
learning  processes  of  the  optimum-learning-rate  algo¬ 
rithm  associated  with  these  two  initial  weights  are  also 
shown  in  Figures  3(a)  and  3(b),  as  well. 

For  group  A,  as  shown  in  Figure  3(a),  the  four  learning 
curves  controlled  by  the  four  learning  rates  essentially 
coincide  with  each  other  using  the  conventional  BP 
algorithm.  The  convergence  error  is  approximately  -5 
dB.  (Experimentally,  it  showed  no  apparent  improve¬ 
ment  even  when  the  number  of  iterations  was  increased 
to  over  1,000.)  In  comparison,  the  learning  curve  of  the 
optimum-learning-rate  BP  algorithm  showed  significant 
drop  in  about  100  iterations.  In  the  vicinity  of  300 
iterations,  it  falls  to  a  satisfactory  level  and  the  conver¬ 
gence  error  is  under  -20  dB.  For  group  B,  as  shown  in 
Figure  3(b),  the  four  conventional  BP  curves  are  quite 
different.  The  0.01 -learning-rate  curve  has  the  best  con¬ 
vergence  rate  and  error.  In  the  neighborhood  of  500 
iterations,  its  convergence  error  is  about  -40  dB.  The 
optimum-learning-rate  BP  algorithm  is  still  better  than 
the  four  conventional  BP  learning  curves.  In  approxi¬ 
mately  500  iterations,  the  convergence  error  has 
declined  to  below  -50  dB,  corresponding  to  an  improve¬ 
ment  of  mapping  accuracy  of  10  dB. 
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(a) 


Number  of  iterations 
Figure  3.  Comparison  of  Learning  Process  of  Conven¬ 
tional  BP  Algorithm  to  Optimum-Learning-Rate  BP 
Algorithm  at  Various  Learning  Rates;  (a)  and  (b)  are 
learning  curves  corresponding  to  initial  values  A  and  B. 


If  the  number  of  arithmetic  operations  required  to  reach 
a  specified  convergence  error  is  used  to  judge  the  con¬ 
vergence  speed,  then  the  optimum-learning-rate  BP  algo¬ 
rithm  requires  5.3  x  105  arithmetic  operations  to  reach 
-20  dB  in  group  A.  The  conventional  BP  algorithm 
would  only  reach  the  -5  dB  level  after  2.7  x  10®  opera¬ 
tions.  In  group  B,  the  conventional  BP  algorithm  and 
optimum-leaming-rate  BP  algorithm  require  2.7  x  106 
and  3.2  x  105  operations,  respectively,  to  reach  a  con¬ 
vergence  error  of  -40  dB.  The  optimum-leaming-rate  BP 
algorithm  saves  approximately  an  order  of  magnitude  of 
arithmetic  operations. 


V.  Discussion 

Based  on  an  analysis  of  the  above  experimental  results, 
the  following  conclusions  can  be  reached. 

(1)  When  using  the  conventional  BP  algorithm  to  train  a 
multilayer  feed-forward  network,  it  is  very  difficult  to 
guess  the  proper  parameters  that  produce  a  satisfactory 
convergence  rate  and  convergence  speed  because  there 
are  so  many  factors  affecting  setting  the  parameters 
(such  as  the  initial  weight). 


(2)  Compared  to  conventional  BP  algorithm,  the  opti¬ 
mum-leaming-rate  BP  algorithm  significantly  improves 
the  convergence  rate  and  error  of  the  learning  process. 
The  experiment  could  not  rule  out  the  fact  that  there 
might  be  a  parameter,  out  of  numerous  guesses,  that 
would  provide  an  even  more  satisfactory  convergence 
rate  and  error  using  the  conventional  BP  algorithm,  as 
compared  to  the  optimum-leaming-rate  BP  algorithm. 
However,  such  a  parameter  could  only  be  obtained  by 
comparing  the  results  of  numerous  learning  processes 
after  repetitive  guessing  and  learning.  The  fine-tuning  of 
the  parameter  is  something  we  wish  to  avoid  in  artificial 
neural  networks. 

(3)  In  each  iteration,  learning-rate  optimization  takes 
approximately  56  percent  of  the  amount  of  computation 
with  a  conventional  BP  algorithm.  However,  this  addi¬ 
tional  computation  load  could  greatly  reduce  the  number 
of  iterations  and  significantly  improve  the  mapping 
accuracy  to  accelerate  the  overall  learning  process. 

(4)  The  optimum-learning-rate  BP  algorithm  still  retains 
the  characteristics  of  a  gradient  algorithm.  Hence,  the 
selection  of  the  initial  weight  remains  important.  Usu¬ 
ally,  the  initial  value  may  be  changed  several  times 
during  training  in  order  to  improve  the  mapping  char¬ 
acteristics  generated  by  the  network. 

These  four  aspects  as  a  whole  are  instrumental  to  the  fast 
nature  of  the  optimum-learning-rate  BP  algorithm. 
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Abstract 

This  paper  presents  a  new  scheme  for  the  optical  imple¬ 
mentation  of  a  bipolar  WTA  (Winner-Take-All)  triple¬ 
layer  neural  network  model  and  its  experimental  results. 
In  the  full  bipolar  mode,  the  input  to  the  ith  neuron  in 
the  middle  layer  consists  of  two  parts.  One  is  the  input  to 
that  neuron  in  the  unipolar  mode  and  the  other  is  one 
half  of  the  reverse  pattern  corresponding  to  the  stored 
pattern.  The  constant  lh  can  be  implemented  by  dividing 
the  input  to  each  neuron  into  two  equal  parts,  i.e.,  an 
opaque  and  a  transparent  part.  Experimentally,  the 
bipolar  system  was  found  to  have  higher  storage  capacity 
and  addressability  than  a  unipolar  WTA  system. 

Key  words:  WTA  neural  network  model,  bipolar  neural 
state,  pattern  recognition,  multi-channel  inner-product 
hologram. 

I.  Introduction 

In  recent  years,  neural  networks  have  been  a  hot  research 
topic  worldwide.  Many  new  models  have  been 
presented1  and  one  of  them  is  the  WTA  neural  network 
model.  It  is  believed  that  the  WTA  neural  network  is  a 
mechanism  in  the  brain.2,3  The  WTA  neural  network  not 
only  has  a  high  storage  capacity  and  addressability  but 
also  can  implement  independent  association  and  mutual 
association.4*6  Furthermore,  the  connection  weights  of 
WTA  are  0,  1  (unipolar),  or  +1,  -1  (bipolar).  A  bipolar 
WTA  model  has  larger  storage  capacity  and  higher 
addressability  than  its  unipolar  counterpart.  Neverthe¬ 
less,  because  negative  neuron  inputs  and  interconnects 
are  involved,  it  is  somewhat  difficult  to  implement 
optically.  The  implementation  of  bipolar  connection  has 
been  reported  in  several  studies.7*10  However,  there  has 
been  no  report  on  the  simultaneous  implementation  of 
bipolar  neuron  state  and  interconnection.  Wang  Xuming 
and  Mu  Guoguang  proposed  a  simple  scheme  to  imple¬ 
ment  a  bipolar  Hopfield  model  by  adding  an  additional 
row  of  elements  on  the  interconnect  mask  to  take  advan¬ 
tage  of  a  preset  distributive  background.1 1 1.  Shari  v  used 
a  birefringent  crystal  to  split  a  polarized  light  beam  into 
two  orthogonally  polarized  beams  to  represent  the  posi¬ 
tive  and  negative  neuron  states  and  implemented  a 
bipolar  triple-layer  network  in  a  double-pass  system. 12  In 
this  work,  our  original  unipolar  optoelectronic  hybrid 
WTA  pattern-recognition  system13  has  been  improved 
and  an  optical  pattern-recognition  system  for  the  imple¬ 
mentation  of  the  bipolar  WTA  neural  network  is  pre¬ 
sented.  When  a  stored  pattern  contains  another  stored 
pattern,  a  unipolar  system  cannot  accurately  recognize  it. 
For  example,  when  shi  [0577]  and  wang  [3769]  are  both 
stored  patterns,  the  original  unipolar  system  cannot 
accurately  judge  whether  the  input  is  shi  or  wang  when 
the  input  is  shi.  It  can  be  recognized  by  a  bipolar  system. 
Therefore,  the  storage  capacity  and  addressability  of  the 
network  is  significantly  increased. 


12 
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II.  Bipolar  WTA  Neural  Network  Model 

1.  WTA  Neural  Network  Model 

Figure  1  shows  the  architecture  of  the  WTA  neural  net¬ 
work  model.  It  is  a  triple-layer  neural  network  with  a  WTA 
hidden  layer.4"6  Each  neuron  in  the  hidden  layer  corre¬ 
sponds  to  a  stored  pattern.  The  number  of  neurons  is  equal 
to  the  number  of  stored  patterns  M.  The  connection 
between  the  jth  neuron  in  the  hidden  layer  and  the  ith 
neuron  in  the  input  layer  is  Wy.  The  interconnect  between 
the  ith  and  jth  neurons  in  the  hidden  layer  is  Ty.  The 
connection  between  the  jth  neuron  in  the  hidden  layer  and 
the  ith  neuron  in  the  output  layer  is  W’y.  When  Wy  is  equal 
to  W’y,  the  network  implements  homo-association  based 
on  content  addressability.  Otherwise,  it  implements  het¬ 
ero-association. 

N  M  N* 


in 


Figure  1.  WTA  Neural  Network  Model 


The  input  pattern  is  the  weighted  summation  of  Wy  to 
determine  the  likelihood  with  every  stored  pattern.  This 
likelihood  is  the  input  to  the  neuron  in  the  hidden  layer. 
When  Wy  is  the  component  Vj  of  the  ith  neuron  corre¬ 
sponding  to  the  jth  stored  pattern,  the  likelihood  of  the 
input  pattern  to  the  stored  pattern  is  their  inner  product, 
i.e.,  the  input  to  the  jth  neuron  in  the  hidden  layer  is  the 
inner  product  of  the  jth  stored  pattern  and  the  input 
pattern.  In  the  hidden  layer,  due  to  its  connection  Ty,  only 
the  output  from  the  neuron  of  the  largest  input  is  non-zero. 
The  rest  of  them  are  all  0  and  that  completes  the  WTA 


operation.  When  the  inner  products  between  two  or  more 
patterns  and  the  input  pattern  are  equal,  the  network 
cannot  properly  recognize  them.  Its  output  will  include  all 
or  none  of  them.  Hence,  one  of  the  primary  methods  to 
raise  the  storage  capacity  and  addressability  of  the  model 
is  to  alter  the  weight  Wy  to  eliminate  the  possibility  that 
more  than  one  stored  pattern  has  the  same  inner  product 


with  the  input  pattern.  In  the  bipolar  WTA  model,  Wy  * 
Xij,  Xjj  is  either  +1  or  -1.  Compared  to  the  unipolar  WTA 
model  (Wy  -  V^,  is  either  1  or  0),  it  eliminates  the 
problem  where  the  system  cannot  accurately  distinguish 


between  two  patterns  when  one  pattern  is  contained  in 


another.  Therefore,  the  bipolar  WTA  model  has  larger 


storage  capacity  and  better  content  addressability. 


bipolar  form  of  the  hi  stored  pattern  and  the  jk  compo¬ 
nent  of  the  input  pattern,  respectively.  Let  the  intercon¬ 
nect  Wjkhl  be  Xjkhl.  Then,  the  input  to  neuron  hi  of  the 
hidden  layer  p  hl  is 

nkl=ytJytw>kklxik=f)'bxli,x,k  (1) 

where  h  =  1, 2,  ...H  and  1=1,2,  ...L  represent  the  position 
of  a  stored  pattern  in  the  H  x  L  matrix,  H  x  L  =  M  is  the 
number  of  stored  patterns,  k  =  1,  2,  ...K,  j  =  1,2,  ...J  and 
J  x  K  =  N  represent  that  a  stored  pattern  is  a  J  x  K  matrix 
and  N  is  the  number  of  neurons  in  the  stored  pattern. 

Any  bipolar  pattern  (Xjk)  can  be  expressed  in  terms  of  its 
unipolar  format  (Vjk): 

Xjk  =  2Vjk  -  1  (2) 

Substituting  equation  (2)  into  (1),  we  have 

=  4SS  iViMklv,k-j  Yik 

*  -  i  /  -  1 

-jVlk>+N  (3) 

The  relation  between  a  pattern  (Vjkhl)  and  its  corre¬ 
sponding  reverse  pattern  (Vjkhl)  can  be  expressed  as 
follows: 

V,V=1-V,V  (4) 

Hence,  =N-±±V,1‘  (5) 

From  equation  (3) ,  we  get 

= 4  £s<vv*  %  * +4- ** ' 

-jVid-N  <6) 

As  far  as  a  network  with  a  given  number  of  neurons  and 
number  of  input-patterns  (Vjk)  is  concerned, 

c=n+ 2;sibv/*  (7) 


2.  Bipolar  Neuron  and  Interconnect  Weight  in  WTA  lg  constant.  Hence,  equation  (6) 

,  .  , .  *  ,  ,.  ,  can  be  written  as: 

The  stored  pattern  in  this  work  is  a  two-dimensional 

vector.  The  neuron  in  the  hidden  layer  (hi)  and  the  K  f  «  _ 

neuron  in  the  input  layer  (jk)  are  connected  by  a  four-  \ih  t  =4SS(V,{'V„+-iy/{')-C  C  8) 

dimensional  tensor  Wjkhl.  Xjkhl  and  Xjk  represent  the  *  - 1  /  -  i  ^ 
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We  already  know  that  each  neuron  in  the  hidden  layer 
corresponds  to  a  stored  pattern  and  only  the  output  of 
the  neuron  of  the  largest  input  value  is  not  zero.  Obvi¬ 
ously,  the  output  of  the  neuron  in  the  hidden  layer  is  only 
dependent  upon  the  relative  value  of  p  h)  and  is  indepen¬ 
dent  of  its  absolute  value.  Hence,  the  constant  C  and  the 
factor  4  in  equation  (8)  can  be  omitted.  Thus,  p  hi  can  be 
simplified  as  follows: 

C  9  ) 

Based  on  equation  (9),  this  bipolar  WTA  model  may  be 
implemented  by  connecting  unipolar  patterns.  The  input 
to  the  hidden  layer  of  the  bipolar  neural  network  is 
comprised  of  two  parts.  One  is  the  input  to  the  corre¬ 
sponding  unipolar  neural  network  and  the  other  is  one- 
half  of  the  sum  of  all  components  of  the  reversed 
unipolar  stored  pattern.  The  second  part  can  be  imple¬ 
mented  by  adding  a  translucent  plate  of  identical 
arrangement  to  the  stored  pattern  (as  shown  in  Figure  2) 
next  to  the  input  pattern  and  by  incorporating  a  reversed 
unipolar  pattern  in  the  interconnection  (as  shown  in 
Figure  3).  Cutting  the  transmittance  to  xh  is  implemented 
by  dividing  a  neuron  into  two  parts;  one  has  a  transmit¬ 
tance  of  1  and  the  other  0.  Thus,  the  difficulty  associated 
with  the  accurate  control  of  lh  transmittance  is  avoided. 


Figure  2.  Input  Pattern 


in 

+K 

Ay 


Figure  3.  Stored  and  Reversed  Stored 
Patterns  of  Interconnected 
Holographic  Recording 


The  interconnect  Tykl  of  neurons  in  the  hidden  layer  is 
implemented  through  the  use  of  an  electronic  network, 
as  shown  in  Figure  4.  It  is  dependent  upon  the  maximum 
input  of  the  neuron.  In  the  iteration  process,  the  neuron 


with  the  highest  input  is  excited  and  provides  an  output 
of  1.  Others  are  inhibited  and  provide  an  output  0. 


When  the  interconnection  between  the  hidden  layer  and 
the  neuron  in  the  output  layer,  W’jkhl,  is  equal  to  Wjkhl, 
the  system  proceeds  with  self-association.  Otherwise,  it 
carries  out  mutual  association.  When  W’jkhl  is  simulta¬ 
neously  chosen  to  be  equal  to  and  not  equal  to  WjkhI,  the 
system  performs  self-association  and  mutual  association 
at  the  same  time. 


III.  Bipolar  Multi-Channel  Interconnection  Holography 

Because  of  advantages  such  as  high  storage  capacity, 
solid  space  distribution  and  distributive  storage  of  mul¬ 
tiple  three-dimensional  objects,  memory  holography  is 
widely  used  in  optical  neural  networks.  16  These  usu¬ 
ally  play  the  role  of  interconnection  in  an  optical  neural 
network.  Figure  5  shows  the  optical  configuration  of  the 
interconnected  bipolar  holographic  system.  Pj  is  the 
input  plane,  and  all  the  stored  and  reversed  stored 
patterns  are  paired  on  Pt.  Various  stored  patterns  are 
separated  in  space.  P2  is  the  plane  where  a  holographic 
interference  plate  is  placed  to  record  the  interconnec¬ 
tion.  The  distances  between  lens  L2  and  and  P2  are  d{ 
and  d2,  respectively.  Furthermore,  the  following  image 
formation  relation  is  satisfied: 

1/di  +  l/d2-  1/f  (10) 

where  f  is  the  focal  length  of  L2. 


Figure  5.  Optical  Configuration  of  Interconnected 
Holography 
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Let  us  assume  that  the  continuous  format  of  the  hlth  stored  pattern  pair  is  (Vhl(j-ah)  k-a^  +  Vhl(}-ah,  E-a,)),  i.e.,  Vhl(j-ah, 
k-a,)  is  the  continuous  format  for  the  hlth  stored  pattern  (Vh'jk)  and  its  coordinate  on  the  input  plane  is  (ah,  a,).  VhlG-ah, 
k-a,)  is  its  reversed  pattern  and  it  is  located  at  (ah,  a,)  on  P,.  Then,  the  complex  optical  field  distribution  on  P2  is 

Okl(a,P)=c{lVhl(a-CLk,  P-al')+ykl(a-ah  ,  0-a,)] 

*  *  hnC.O’fP) 

=dVt,  ( — jMa+O.  — i-{/*+“,))+  V,,( — i-fa-Hi,),  — <<*+  ■■/)) 

exp[-w<0'+w  ]  01> 

Here,  hdl  and  hd2  are  the  pulse  response  functions  of  d,  A  convergent  spherical  wave  is  used  as  the  reference.  Its 
and  d2  with  reference  to  free  space,  respectively.  T2(x,y)  complex  optical  field  distribution  on  P2  is: 
represents  the  phase  transformation  introduced  by  lens  . 

L2.  *  represents  convolution  and  c  is  an  exact  function.  R(.a,p) =exp{-^C(a-ft0)  +P  J/  (13> 


Then,  the  complex  optical  field  of  the  input  object  on 
plane  P2  is 

0(a,P)=jtibokl(a,ft)  02) 

A-l /-I 


The  total  complex  optical  field  on 
plane  ?2  is 

0(a,P)+R(aJ)=j£jzoi!(a,p')+R(a,fo 

A  ■  1  l  -  1 

(14) 


The  optical  field  is  recorded  holographically.  After  processing  the  exposed  plate,  the  part  of  amplitude  transmittance 
that  is  related  to  first-order  diffraction  is 


t(a,j3)=0*(a,/J)jR(a,/J) 


H  L 

=  c  ss  zvk 

A-1L-1 


a2  a2 

^-( a+ak ), v-(^+a/))3 


d. 


d2 


«pt~£  c<0-w'  +«-- if|-<a'+w> 


(15) 


The  stored  patterns  in  this  work  contain  nine  Chinese  characters;  their  holograms  are  shown  in  Figure  3. 

IV.  Experimental  Results 

Figure  6  shows  an  optoelectronic  WTA  pattern-recognition  system.  P,  is  its  input  plane.  A  grating  is  placed  at  P2 
behind  the  imaging  lens  L2.  P3  is  the  interconnected  hologram.  P4  is  the  output  plane  of  the  holographic  plate.  V’G.k) 
is  the  input  to  the  input  plane.  It  consists  of  a  unipolar  input  pattern  and  a  translucent  pattern,  as  shown  in  Figure  2. 
Then,  the  complex  optical  field  distribution  on  plane  P3  is 


E(.a,p')  =  {lV'<.a,fi')*hdl<.a,fi)lT1(.a,p)T,(a,P)} 

*  A«(a,j5)f(a,j9) 

=  *  ^-Z(a-h0y+P2l}  X 

a-  i ; -i 

{ZV(-d1(a/d1+ah,-dl')  (P+*t/di)')Vki(-dl(a  +  akl/dJ-d1(P+a,/di')') 
exp[t Kdi(.ak&  +  aiP)/d2+  l/2exp  [iWi(fl/i  a+  aiP)/ddi2x 
V*t(-di(a+ak/d2'),  -d1(P+’a,/d2)')l}  (16) 

Tj(.atP)  =  1/4[1  +  Sgn(cos(<dCl))  (1  +Sgn(<y/J))!3  (17) 
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is  the  transmittance  of  the  grating,  to  -  2tc  a/  X  d2,  and  a  is  the  distance  between  various  stored  patterns.  The  complex 
optical  field  distribution  at  output  plane  P4  of  the  connected  associative  storage  device  is 


ZV(.-d1a/di+ai,-dlP/d1+al')  x 
Vtl(-dla/di+ai,-diP/d1+al') 

+  l/2i7  i,(-d,a/d,+a„-d,0/d7+al')] 


(18) 


Obviously,  the  bipolar  inner  product  of  input  pattern 
V(j,k)  and  output  pattern  Vhl(j,k)  can  be  obtained  on 
plane  P4.  It  is  located  at  (h0  +  dod1ayd2,_dodia,I),  where 
a’h  “  (ah  +  ahV2  -  ha,  a’j  -  +  5^/2  =-  la.  An 

optoelectronic  triode  is  used  to  convert  the  intensity  of 
the  inner  product  to  an  electrical  signal  and  this  signal  is 
transmitted  to  the  WTA  circuit  in  the  hidden  layer.  By 
means  of  the  interconnection  effect,  the  winner  takes  all. 
Only  the  neuron  with  the  largest  input  provides  a  high- 
voltage  output.  Others  send  a  low-voltage  output.  This 
high-voltage  signal  is  used  to  control  a  LED  to  illuminate 
the  corresponding  self-association  or  mutual  association 
to  thus  provide  associative  recognition.  Figure  7  [photo¬ 
graph  not  reproduced]  shows  the  results.  The  use  of 
bipolar  connection  and  bipolar  neuron  input  overcomes 
the  problem  wherein  accurate  identification  is  not  pos¬ 
sible  with  a  unipolar  WTA  system  when  a  pattern 
contains  another  pattern.  The  storage  capacity  and 
addressability  of  the  system  have  also  been  improved. 


<«.*> _ <*tr> 


.*>  (...) 

c 

WTA 

net¬ 

q 

*4 

work 

i 

»,  L«  Pi  Pi 

Figure  6.  WTA  Pattern  Recognition  System 


V.  Conclusions 

Both  theoretical  analysis  and  experimental  results  illus¬ 
trate  that  a  bipolar  WTA  model  has  a  larger  storage 
capacity  and  higher  addressability.  It  not  only  is  capable 
of  performing  self-association  but  also  mutual  associa¬ 
tion.  In  addition,  its  illumination  and  gray  scale  are 
invariant.  This  is  because  the  output  of  the  neurons  in 
the  hidden  layer  is  only  dependent  upon  the  relative,  but 
not  the  absolute,  value.  If  a  pre-processor  is  placed  in 
front  of  the  system  and  the  reference  pattern  is  replaced 
by  a  corresponding  invariant  characteristic,  then  the 
system  may  be  invariant  with  reference  to  inner  and 
outer  rotation,  scale,  and  displacement. 
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[Text] 

Abstract 

The  Fourier  descriptor  method  is  extended  to  produce  a 
set  of  invariants  that  are  independent  of  the  affine 
transformation.  These  invariants  are  used  to  train  a 
triple-layer  perceptron  network  for  the  identification 
and  classification  of  aircraft.  An  accelerated  learning 
algorithm  is  adopted  to  significantly  reduce  the  learning 
time.  Finally,  the  results  of  using  such  a  neural  network 
for  identification  and  classification  of  aircraft  and  spec¬ 
ification  of  noise  tolerance  are  also  presented. 

Key  words:  affine  transformation,  invariant,  neural  net¬ 
work,  backpropagation,  classification. 

I.  Introduction 

Identification  of  a  two-dimensional  pattern  is  an  impor¬ 
tant  subject.  Usually,  this  type  of  problem  is  done  by 
similarity  transformation,  i.e.,  translation,  rotation, 
magnification  or  reduction.  Intuitively,  the  shape 
remains  invariant.  When  a  plane  pattern  (e.g.,  an  air¬ 
craft)  is  observed  at  different  viewing  angles,  its  shape 
cannot  remain  unchanged  as  a  result  of  affine  transfor¬ 
mation.  To  this  end,  K.  Arbter1  proposed  identifying  the 
pattern  by  finding  a  set  of  invariants  through  Fourier 
transformation.  This  set  of  invariants  is  given  in  the 
form  of  complex  numbers.  In  this  work,  a  new  method  is 
presented  to  find  these  invariants.  This  set  of  invariants 
is  expressed  in  terms  of  real  numbers.  It  does  not  take 
phase  into  consideration  in  performing  Fourier  transfor¬ 
mation.  These  invariants  can  be  used  to  train  a  three- 
layer  perceptron  to  identify  and  classify  aircraft.  One  of 
the  major  problems  in  using  a  neural  network  is  the  slow 
training  process.  To  this  end,  an  accelerated  learning 
algorithm  is  introduced  to  significantly  speed  up  the 
learning  process.  Finally,  the  noise  tolerance  of  this 
neural  network  classifier  is  analyzed. 

II.  Brief  Introduction  to  Affine  Transformation 

Any  three-dimensional  motion  of  a  rigid  body  can  be 
treated  as  a  rotation  about  an  axis  that  passes  through 
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the  origin  and  translation  along  the  three  axes.3  The 
equation  of  motion  can  be  expressed  as  follows: 


V" 

~  X  “ 

"dx" 

y' 

-R 

y 

+ 

dy 

z>  _ 

_d«. 

where  (dx,  dy,  dz)T  is  the  translation,  R  is  a  3  x  3  rotation 
matrix,  and  (x,  y,  z)T  and  (x\  y\  z’)T  represent  the 
coordinates  of  a  point  on  the  body  before  and  after  the 
movement,  respectively.  Let  us  assume  that  the  plane  of 
projection  is  z  -  1,  then  their  projections  are: 


X’  -  x’/z\  Y’  =  y’/z\  X  =  x/z,  Y  =  y/z  (2) 

If  the  aircraft  is  far  away  from  the  image  plane  and  varies 
very  slightly  in  the  z  direction,  i.e.,  z’  z,  then  the 
transformation  on  the  projection  plane  can  be  expressed 
as  follows: 


Or,  in  vector  format,  it  is: 

W’  =  AW  +  B  (4) 

Where  A  is  a  2  x  2  matrix  and  det(a)  not  equal  to  0,  and 
W\  W  and  B  are  two-dimensional  column  vectors.  This 
is  affine  transformation. 


We  know  that  if  matrix  A  can  be  expressed  as  A  *  aU,  where  a  is  a  constant  and  U  is  an  orthogonal  matrix,  then 
equation  (4)  becomes  a  similarity  transformation.  Affine  transformation  does  not  place  any  requirement  on  A. 
Therefore,  similarity  transformation  is  a  part  of  affine  transformation.  Affine  transformation  contains  similarity 
transformation.  Figure  1  shows  a  few  examples  of  such  transformations. 

r  r  t  r 

Reference  Simi-  Affine  Non-affine 

pattern  larity  transfor-  transformation 

Figure  1.  ExpTanatlonof Affine  Transformation 


III.  Determination  of  Invariants 

The  key  to  performing  a  Fourier  transformation  is  to 
locate  a  parameter  that  varies  linearly  with  respect  to  an 
arbitrary  affine  transformation.  Arbter1’2  provided  a 
parameter  t: 

(xdy-ydx).  (5) 


where  c  is  the  silhouette  of  the  plane  pattern  and  proved 
that  the  following  conditions  exist  when  B  =  0  in  equa¬ 
tion  (4). 

(1)  The  parameter  varies  linearly  under  affine  transfor¬ 
mation,  i.e.,  if  t’  is  the  parameter  corresponding  to  the 
pattern  to  be  identified  and  t  is  the  parameter  for  the 
reference  pattern,  then  constants  c  and  d  exist  to  satisfy 
t’  -  c(t  +  d). 
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(2)  This  parameter  is  independent  of  the  selection  of  the 
initial  point  on  the  curve. 

When  B  i*  0,  the  origin  of  the  coordi¬ 
nate  system  can  be  moved  to  B. 

B  =  (fi  Wdx&yjify  dxdy  and  its  accuracy 

can  be  proved  as  follows.  If  the 
reference  pattern  satisfies 

&  Wdxdyl<&  d*dy=0,  and  a  transfor¬ 


mation  W'  -  AW  +  B  is  made  to  change 
the  enclosed  area  from  D  to  D',  then 

B'=  <|)  W'dxdyj^  dxdy  = 


d(AW+B)  dxdyj^dxdy  m  B.. 


Theorem  1.  The  translation  in  an 
affine  translation  can  be  determined 
by 

B=i>  Wdxdy/S  dxdy  (6) 

JJ  D  I  M  D 

If  an  arbitrary  point  on  the  edge  is 
expressed  as  a  vector  function 

W=[Xi°],  after  performing  Fourier 

transformation  with  respect  to  x(t) 
and  y(t),  a  matrix  of  coefficients 
expressed  in  terms  of  Fourier  series 
is  obtained . 
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In  equation  (4),  when  B  -  0,  the  discussion  can  be  carried 
out  in  two  cases. 

1)  w’(t)  -  AW(t)  and  Fourier  transformation  is  done 
with  respect  to  both  sides: 

rXL'  +  iXL'l  rXL  +  iXLl 

LY'.'  +  iYi'J  Ly;+iyiJ 


where  Xnr  and  Xnl  represent  the  real  and  imaginary  part 
of  the  nth-order  coefficient  Xn.  Based  on  the  definition 
of  equal  complex  numbers,  we  have 
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Therefore,  Yr/  Yi f  ~~  ™  y^  y/ 

2)  W’(t,)  =  W(t)  and  =  C(t  +  d).  After  performing 
Fourier  transformation  with  respect  to  both  sides,  we 
have 

r«'+m'i  ,r*+urh 

LYy+iYi'i  Ly;+ty'J 

where  b  is  a  constant. 

Let  X.=  \X.\e***,  Y.=  \Yn\e^\ 


then, 

\X i  XL 
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cos  0,  sin  <t>x 
cos  <t>y  sin  <t>v 


|X.|  l y •  I  sin  (<#>»  — 0#> 

XL'  XL'  _ 

YL'  YL' 

|  cos  <*,+"&)  sin  <0,  +  "*)  I 

■  .  |y«l  I  y«!J  cos  (0y+nj>)  sin  (<#>y+n&)  • 

\X»\  \Y»\  sin  (0y  — d>,) 

Combining  these  two  cases,  we  get: 

Theorem  2.  For  an  arbitrary  affine  transformation 

_  Xrn  Xi 

n  y:  ri 


is  a  set  of  invariants. 

Figure  2  shows  six  aircraft  models  to  be  identified. 
Figure  3  shows  eight  different  configurations  of  model  D 
by  means  of  affine  transformation.  Table  1  lists  the 
invariants  determined  by  the  method  described  in  this 
work. 
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Table  1.  Invariants  Determined  Corresponding  to  Aircraft  Shown  in  Figure  3 


A 

-2.11 

0.385 

-2.31 

0.338 

0.520 

-.813 

-0.401 

0.171 

-0.163 

-0.312 

B 

-2.13 

-0.90 

■ 

oo 

0.121 

-.792 

-0.132 

0.159 

-0.332 

-0.107 

C 

-2.12 

0.361 

-2.30 

-.843 

-0.356 

0.155 

-0.228 

-0.293 

D 

-2.25 

-0.23 

-2.26 

-.927 

-0.188 

0.254 

-0.347 

-0.206 

E 

-2.19 

0.241 

-2.30 

0.366 

0.562 

r-~ 

r- 

oo 

-0.283 

-0.251 

F 

-2.22 

0.245 

-2.29 

0.358 

-.886 

-0.302 

0.225 

-0.276 

-0.265 

G 

-2.15 

0.306 

-2.30 

0.529 

-.846 

-0.346 

0.194 

-0.277 

H 

-2.20 

0.165 

-2.29 

0.531 

-.910 

-0.241 

a  a  o 

o  a  r 

Figure  2.  Six  Aircraft  Models  Used 


Figure  3.  Configurations  of  Model  D 
_ _ Obtained  by  Affine _ 


IV.  Neural  Network  Classifier 

Several  neural  network  models  are  used  as  classifiers. 
The  most  commonly  used  network  is  a  multilayer  per- 
ceptron.  A  multilayer  perceptron  is  usually  trained  by 
backpropagation  (BP).  Let  us  assume  that  Wy  represents 
the  connection  weight  between  neuron  i  and  neuron  j  in 
the  subsequent  layer.  If  dj  and  yj  represent  the  ideal  and 
actual  output  of  an  output  node,  respectively,  then  the 
overall  error  caused  by  a  specific  pattern  can  be 
expressed  as  follows: 


dEp 

dWit 


(  9  ) 


where  r|  is  the  learning  rate.  Weight  correction  is  an 
iterative  process  up  until  the  convergence  condition 
absolute  value  of:  A  Wy  <  c  absolute  value  of:  is  met. 
As  for  the  selection  of  t),  usually  a  certain  value  is  given 
before  the  learning  process  begins.  If  the  converging 
process  is  slow,  then  t|  is  gradually  increased.  If  r|  is 
found  to  be  too  large,  even  causing  oscillation,  then  t|  is 
gradually  reduced.  There  are  no  specific  methods  to  raise 
or  lower  T|.  Instead,  one  relies  on  experience.  Usually  q  is 
chosen  to  be  a  constant.  In  this  work,  q  -  0.01.  We  found 
that  when  the  algorithm  approaches  its  steady  state,  its 
convergence  rate  becomes  very  slow;  this  is  because  the 
weight  correction  method  described  earlier,  i.e.,  the 
fastest  fall  method,  is  decreasing  linearly.  We  also  know 
that  the  speed  of  the  Newton  method  that  corrects  the 
weight  according  to 


decreases  by  the  square  power.  However,  its  disadvan¬ 
tage  is  that  each  step  might  be  so  wide  as  to  cause 
oscillation.  Usually,  the  scheme  is  to  use  the  fastest  fall 
method  to  adjust  it  to  near  its  steady  state  and  then  use 
the  Newton  method  to  correct  the  weight.  The  number  of 
iterations  can  be  drastically  reduced.  It  is  of  great  signif¬ 
icance  to  choose  the  right  q  in  order  to  minimize  the 
number  of  iterations  required.  A  method  has  been  devel¬ 
oped  to  dynamically  calculate  q.4  Assume 


then  q  can  be  determined  as  follows: 

if  a  not  equal  to  0,  then  q  *  a  if  a  <  20;  20  otherwise, 


Et-'^(dj—y,y  (8) 

The  weight  correction  process  in  the  conventional  BP 
method  can  be  described  as  follows: 


if  a  *  0,  then  q  «=  0. 

After  implementing  this  dynamic  calculation  of  q  in  this 
work,  the  number  of  iterations  was  reduced  from  900  to 
approximately  60. 
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V.  Experimental  Results 

In  this  work,  a  three-layer  perceptron  is  chosen  as  the 
neural  network  classifier.  This  three-layer  perceptron 
has  1 5  input  nodes  and  6  output  nodes.  The  number  of 
nodes  in  the  hidden  layer  is  adjustable.  We  picked  a 
range  of  10  to  50.  For  the  six  aircraft  shown  in  Figure 
2,  15  different  configurations  are  produced  for  each 
model  by  affine  transformation.  A  total  of  90  input 
samples  are  used  to  train  this  three-layer  perceptron.  In 
this  work,  the  noise  tolerance  of  this  multilayer  percep¬ 
tron  was  also  investigated.  For  each  aircraft  model,  a 
group  of  noise-superimposed  images  was  produced  to 
test  the  accuracy  rate  of  this  classifier.  Assuming  that 
each  edge  point  can  move  arbitrarily  within  an  L  x  L 
window  centered  around  this  point,  the  noise  rises  as  L 
increases.  For  instance,  Figure  4,  from  left  to  right, 
shows  the  configurations  as  the  noise  level  increases 
from  1  to  5.  At  every  noise  level  L,  15  test  patterns  are 
generated  for  each  aircraft.  A  total  of  90  noise- 
superimposed  images  are  used  to  test  the  accuracy  of 
the  network  for  identification  of  the  aircraft.  Figure  5 
shows  the  error  rate  as  a  function  of  noise  level  L.  It 
was  found  that  even  the  network  trained  with  noise- 
free  specimens  must  also  have  a  high  resistance  against 
noise.  The  reason  is  that  it  plays  an  important  role  in 
determining  some  of  the  invariants  and  a  secondary 
role  in  the  remaining  invariants.  As  the  noise  level 
rises,  the  impact  on  the  primary  invariants  is  far  less 
than  that  on  the  secondary  ones.  The  three-layer  per¬ 
ceptron  can  depend  on  the  primary  invariant  to  per¬ 
form  classification.  Finally,  noise-superimposed 
images  were  used  to  train  this  three-layer  network  and 
we  found  that  it  had  a  better  noise  tolerance. 

^  ^  ^  *2^ 

4  ^  4  4  ciyr> 

Figure  4.  Noise-Superimposed  Models 
(Noise  Level  1  to  5  from 
left  to  right) 


Figure  5.  Noise  Level  vs.  Identifies' 
tion  Error 


VI.  Conclusions 

An  aircraft  identification  method  involving  affine  trans¬ 
formation  is  presented.  The  mode  recognition  process  is 
completed  by  BP  network.  This  method  is  applicable  to 
any  plane  pattern  with  a  continuous  and  smooth  border. 
The  accelerated  algorithm  introduced  can  drastically 
reduce  the  computing  time.  It  is  general  in  nature  and  can 
be  extended  to  many  neural  network  learning  processes. 
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[Text]  Abstract 

This  paper  presents  a  dynamic-programming-based  time 
normalization  method  to  implement  a  DTW  neural 
network.  DTW  (dynamic  time  warping)  is  one  of  the 
most  effective  methods  for  spoken  word  recognition.  It  is 
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very  robust  and  provides  the  highest  recognition  rate 
possible.  However,  it  takes  too  much  computing  time. 
Unless  with  special  hardware,  its  implementation  is 
limited  by  time.  In  this  work,  all  the  computation  is 
carried  out  by  two  recurrent  subnets  and  a  memory 
layer.  This  method  demonstrates  the  superiority  of  the 
hard-wired  architecture.  It  offers  a  new  strategy  for  the 
implementation  of  DTW  by  hardware. 

Key  words:  neural  network,  speech  processing,  pattern 
recognition. 

I.  Introduction 

It  is  well  known  that  because  voice  signals  are  highly 
random,  speech  patterns  are  often  nonlinearly  distorted 
along  the  time  axis.  How  to  resolve  this  problem,  i.e.,  the 
time  normalization  problem,  is  an  important  issue  in 
speech  recognition.  DTW  is  a  nonlinear  time  normaliza¬ 
tion  algorithm.  Through  the  normalization  of  two  input 
voice  signals,  their  acoustic  similarity  can  be  maximized 
and  their  cumulative  distance  is  minimized. 

When  using  DTW  to  recognize  an  isolated  voice  signal, 
the  unknown  templates  are  compared  one  by  one  with 
the  reference  templates  until  a  reference  template  with 
the  least  cumulative  distance  is  found  and  this  template 
is  the  result. 

As  we  know,  an  effective  algorithm  can  produce  satisfac¬ 
tory  results  with  the  appropriate  hardware.  Recent 
studies2,3  show  that  the  neural  network  is  an  important 
technique  to  accomplish  optimization.  In  this  paper,  we 
will  demonstrate  the  use  of  a  neural  network  to  solve  the 
dynamic  normalization  problem.  Furthermore,  an  archi¬ 
tecture  is  provided  for  the  implementation  of  this  neural 
network  with  optics  and  semiconductors. 

The  second  part  of  this  paper  briefly  discusses  the  DTW 
algorithm.  The  third  part  shows  a  neural  network  archi¬ 
tecture  to  implement  this  algorithm.  Finally,  the  efficacy 
of  this  method  is  verified  with  a  real  example. 


Figure  1.  Limiting  Window  for  Normalization  on  I-J 
Plane  and  Normalization  Function 


F  is  known  as  the  normalization  function.  It  is  used  to 
complete  pattern  matching  from  A  to  B.  When  there  is 
no  time  difference  between  the  two  input  speech  pat¬ 
terns,  the  normalization  function  is  the  straight  diagonal 
line  i  =  j.  As  the  time  difference  increases,  the  normal¬ 
ization  function  deviates  farther  from  the  straight  line. 
In  order  to  determine  the  difference  between  Eigen 
vector  at  and  bj,  let  us  define  local  distance  (LD)  as 
follows: 

d(c)  =  d(i,  j)  =  |  |  a  j  -  bj|  |  (4) 

The  weighted  sum  of  LD  is  defined  as  the  general 
distance  (GD) 

E(F)=J2d(c(k))  WCh)  (5) 

*  -  l 

where  W(k)  is  the  weight  coefficient  which  in  conjunc¬ 
tion  with  the  slope  constraint1  controls  the  normaliza¬ 
tion  function.  When  F  is  determined,  E(F)  is  minimized. 
Correspondingly,  the  time  difference  between  two 
matching  specimens  is  adjusted.  The  time-normalized 
distance  between  patterns  A  and  B  is  defined  as  follows: 


II.  Dynamic  Time  Normalization  Algorithm 

Speech  can  be  expressed  in  terms  of  proper  parameters, 
We  use  Eigen  vectors  to  express  voice  signals: 

A  *  ai,  a2, ...  af, ...  aj,  B  ■  bj,  b2, ...  bj, ...  bj  (1) 

Now,  let  us  consider  how  to  eliminate  the  time  difference 
between  these  two  speech  patterns.  Consider  the  I-J 
plane  shown  in  Figure  1.  The  I  axis  and  J  axis  represent 
the  unknown  and  reference  pattern,  respectively.  The 
time  difference  between  these  two  patterns  may  be 
expressed  by  a  point  sequence  c  -  (i,  j): 


D(A, 


£<*(*(*))  WOO 

k  - 1 _ 


(6) 


K 

where  ^W(k)  in  the  denominator  is 

used  to  compensate  for  the  effect  of 
the  weight  on  F, _ 


F  «  c(l),  c(2), ...,  c(k)  (2) 

where 

c(k)  -  (i(k),  j(k))  (3) 


Based  on  unique  features  associated  with  speech,  a 
number  of  constraints  such  as  monotony  and  continuous 
boundary  of  variation  of  speech  parameters1  are 
imposed  on  the  normalization  function.  Usually,  the 
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search  area  of  the  normalization  function  is  limited  to  a 
window  on  the  I-J  plane,  called  the  limiting  window,  as 
shown  in  Figure  1. 

During  the  minimization  process  for  D(A,  B),  the  GD  for 
every  point  shown  in  Figure  1  is  determined.  They  may 
become  a  point  in  the  point  sequence  of  the  normaliza¬ 
tion  function  F.  The  following  iteration  equation  is  used 
to  optimize  D(A,  B). 

ft*-!) 

(7) 


In  Figure  1,  from  left  to  right  and  from  bottom  to  top, 
the  partial  general  distance  (PGD)  is  calculated  step  by 
step.  In  the  equation,  c(k  - 1)  includes  all  matching  points 
in  the  previous  step.  We  will  discuss  how  to  implement 
this  process  with  two  recurrent  neural  subnets  in  the 
following.  In  this  case,  every  point  is  linked  by  code.  For 
the  problem  under  investigation  in  this  work,  equation 
(7)  may  be  written  as: 


80,  /)  =min 


80,  j-D+dO,  ;) 
80-1,  i-O+2d0,  D 
180-1,  D+dO,  /) 


(8) 


When  the  optimization  process  is  completed,  if  neces¬ 
sary,  it  is  possible  to  construct  the  optimal  normalization 
path  F  by  using  the  associative  memory  BTP.  The 
tracking  begins  from  the  last  node;  the  normalization 
function  F  may  be  reconstructed  by  connecting  the  nodes 
pointed  out  by  BTP  neurons. 

1.  Guiding  Layer 

This  layer  contains  two  recurrent  neural  subnets  com¬ 
prised  of  and  M2  fully  connected  high-level  neurons,2 
respectively.  We  have  R  =  +  M2.  The  state  of  the 

neuron  in  this  layer  is  expressed  as  S  =  unordered  set:  Yl, 
Y2, ...,  Y2R  [R  is  superscript  for  2],  where  Y  ■  (Yj,  Y2, 
...,  Yt)  e  unordered  set:  -1 ,  1 T.  Assuming  y  and  y’  are  the 
past  and  present  state,  then  the  recurrent  process  can  be 
described  as  follows: 

rkO >*WW>=!2  WJVOO  (9) 

* 00—0 

The  summation  covers  the  entire  M  bits  of  the  binary 
series  a  =  a1a2...aM  e  unordered  set:  0,  1M.  We  use  Wh  = 
(Wh00'"°,  Wh*~\...f  Wh11***1)  to  express  the  2M  dimen¬ 
sional  higher-order  weight  vector  for  neuron  h.  Then,  by 
using  T|  =  (Tioo.,0,  TI00..1,  ^n  .i)  6  unordered  set:  -1, 
12M  [M  is  superscript  for  2]  and 

U0> 

A  *  l 


Here,  I  W(k)  - 1  +  J.  When  the  computation  reaches  the 
last  point  (I,  J),  D(A,  B)  can  be  optimized  and  the  value 
is  g(I,  J)/(I  +  J). 

If  an  optimization  strategy  table  is  used  to  record  the 
local  optimization  path  employed  to  derive  the  current 
PGD,  then  the  normalization  function  F  may  search  in 
the  reverse  direction. 


the  state  vector  y  may  be  expanded  into  a  2M  dimen¬ 
sional  state  vector.  The  hth  bit  of  a  in  the  equation 
provides  the  power  of  yh.  From  the  above,  the 
architecture3  provided  by  equations  (9)  and  (10)  is 
sufficient  to  produce  any  reflection  of  y’:  unordered  set: 
-1,  1M  —  unordered  set:  -1,  1M.  The  learning  process  of 
the  recurrent  weight  is: 


III.  Neural-Network-Guided  Computing  Architecture 

In  the  optimization  process,  a  recurrent  chain  table  is 
created  for  all  states  associated  with  axes  I  and  J.  Each 
state  is  coded  with  a  binary  state  vector  Y  in  a  recurrent 
network,  i.e.,  the  so-called  network  layer  I  and  network 
layer  J  of  the  guiding  network.  For  I  and  J,  they  are 
coded  with  and  M2  neurons  in  the  form  of  (unor¬ 
dered  set:  -1,  1).  Furthermore,  it  satisfies  2M1  [1  is 
subscript  for  M]  >  I,  2M2  [2  is  subscript  for  M]  >  J.  In 
Figure  1 ,  to  make  it  convenient  to  define  the  position  of 
a  point,  I  and  J  are  coded  and  the  two  codes  are 
combined  into  a  state  vector  Y.  Since  the  number  of 
neurons  grows  logarithmically  with  respect  to  I  and  J,  the 
scale  of  the  network  is  suited  for  the  matching  of  a  large 
number  of  samples. 

A  state  is  used  to  describe  a  point  with  reference  to 
guiding  layer  network  I  and  J.  A  converter  is  used  to 
match  the  binary  state  vector  with  the  corresponding 
PGD  and  BTP  (backward  tracking  pointer).  This  con¬ 
verter  is  implemented  by  the  help  of  a  forward  network 
layer  called  the  memory  layer. 


oi> 

2  a-  1 

The  state  evolves  in  the  recurrent  subnets  to  create  the 
node  sequence  shown  in  the  chain  table.  In  order  to 
establish  such  two  recurrent  subnets,  consider  the  fol¬ 
lowing  finite  sequence  y1  — ♦  y2  — »  ...  — *  ykn;  K  e 
unordered  set:  1,  ...,  2M.  As  discussed  in  the  previous 
section,  all  neuron  states  are  linked  from  bottom  to  top 
along  the  I  axis  and  from  left  to  right  along  the  J  axis 
when  the  system  is  initialized.  Based  on  equation  (9),  a 
new  state  is  totally  determined  by  the  state  at  the 
preceding  moment.  In  addition,  it  is  defined  that  y1 
immediately  follows  y2M  [M  is  superscript  for  2]  to  form 
a  recurrent  chain. 

In  this  network,  to  increase  or  decrease  the  number  of 
states  is  a  simple  operation.  If  the  present  state  ykJ  is  to 
be  deleted,  it  can  be  accomplished  by  breaking  the  links 
established  by  equation  (11)  that  connect  the  present 
state  to  the  previous  state  and  the  present  state  to  the 
future  state. 
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the  chain  can  be  reconnected. 

(12) 

Then,  according  to  the  following 
equation 

AW^-jz-yA*  1-0  (13) 

This  simple  delete  operation  makes  it  easy  to  reconnect 
the  chain  in  this  network.  When  a  new  state  is  added  to 
the  chain  table,  it  is  implemented  by  using  equations  (12) 
and  (13).  The  difference  is  that  equation  (13)  must 
connect  the  previous  state  to  the  new  state  and  the  new 
state  to  the  future  state. 

2.  Memory  Layer 

In  order  to  perform  digital  computing  and  BTP 
recording,  a  forward  network  is  introduced  to  corre¬ 
spond  all  states  in  the  first  recurrent  subnet  with  all 
interconnected  LD,  PGD  and  BTP.  It  is  called  the 
memory  layer. 

This  network  operates  in  a  manner  similar  to  the  first 
layer.  The  system  R  has  a  binary  input,  but  only  outputs 
two  real  numbers;  one  is  LD  which  later  upgrades  to 
PGD  and  the  other  is  BTP.  Therefore,  the  layer  has  two 
weight  vectors,  Wd  and  Wp.  The  learning  process  for  the 
reflection  of  the  state  vector  onto  LD  follows  the  same 
manner  as  expressed  by  equation  (11). 

4r£w(>*)90*)  (14) 

In  the  DTW  treatment  in  this  work,  every  node  in  the 
chain  table  has  three  local  paths  to  select.  After  choosing 
the  optimal  local  path  based  on  equation  (8),  the  min¬ 
imum  PGD  is  obtained.  In  addition,  the  previous  point 
before  the  optimal  path  is  denoted  for  backward  search. 
This  can  be  accomplished  by  using  PGD  to  upgrade  the 
LD  neuron.  This  upgrade  means  that  the  weight  Wd  is 
updated  according  to  the  following: 

AW MO*')  )*»(**>  <15> 

Furthermore,  the  BTP  neuron  is  denoted  by  unordered 
set:  -1,  0,  1  and  each  value  represents  a  possible  local 
path.  We  have 

1  2* 

Wp^-p-^btPO^TiO*) 

k-i 


The  BTP  neuron  may  be  initialized  by  training  the 
weight  so  that  the  output  of  every  column  of  the  I-J  plane 
from  bottom  to  top  is  -1 .  When  a  point  is  located  outside 
the  limiting  window,  its  output  is  1.  In  other  situations, 
its  output  is  0. 

3.  Auxiliary  Control  Unit 

Figure  2  shows  the  complete  network  architecture.  In  the 
local  path  processing  module,  some  control  units  are 
used. 


The  comparator  COM  is  used  to  make  a  determination 
on  the  result  produced  by  equation  (8).  Corresponding  to 
the  three  input  values,  the  comparator  has  two  different 
outputs,  the  minimum  g(i,  j)  and  BTP  marker.  The 
output  of  the  comparator  simultaneously  trips  the 
upgrade  of  the  LD  neuron  and  the  recording  of  the  BTP 
marker. 

An  accumulator  is  placed  in  front  of  the  comparator  to 
generate  the  three  possible  PGD  values  for  comparison. 

The  addresser  ADDR  monitors  and  records  the  possible 
initial  addresses  of  local  paths  in  subnets  I  and  J.  The 
unit  is  comprised  of  registers  CUR  to  store  the  current 
state,  and  PRE 1 ,  PRE2  and  PRE3  for  storage  of  previous 
states  (i,  j-1),  (i-1,  j-1)  and  (i-1,  j),  respectively.  For  a 
given  state  (i,  j),  it  looks  up  the  address  for  state  (i,  j- 1 )  by 
keeping  subnet  I  invariant  and  evolving  the  J  subnet  J-1 
times.  Similarly,  the  addressing  of  (i-1,  j-1)  and  (i-1,  j)  is 
done  in  the  same  fashion. 

A  number  of  triggers  are  employed  in  the  system.  They 
are  added  to  the  BTP  neurons  to  guide  the  computing 
process. 

Trigger  SIG  reports  the  completion  of  the  normalization 
process,  i.e.,  the  optimization  of  F,  and  triggers  the 
backward  search. 

Triggers  IINC,  JINC  and  PBLK  control  the  operations  in 
the  evolution  and  processing  modules  of  subnets  I  and  J. 
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IV.  Speech  Recognition  Experiment  Using  Isolated 
Characters 

Speech  signals  are  converted  to  digital  signals  by  a  22 
kHz  A/D  converter.  The  speech  waveform  is  pre- 
processed  with  a  partially  overlapping  Hamming 
window.  The  signal  window  is  22  milliseconds  wide. 
Eigenvectors  are  separated  after  passing  through  a  20- 
channel  narrow-band  filter  array/  The  speech  databank 
is  comprised  of  10  single-syllable  words  of  the  numbers 
0-9  in  Cantonese.  The  reference  samples  are  provided  by 
20  males  and  20  females.  The  test  speech  samples  are 
taken  from  five  females  and  five  males  other  than  those 
providing  the  referencing  samples. 


The  Euler  (?)  distance  between  the  two  vectors  is  used  to 
initialize  the  local  distance  LD  of  the  network.  The 
memory  layer  is  initialized  according  to  equation  (14). 
The  chain  table  of  all  the  states  is  initialized  prior  to 

Eerforming  matching  in  subnet  I  and  J  independently, 
or  a  given  vector  pair  to  be  matched  (A,  B),  according  to 

equations  (12, 13),  states  y,+1 . yJM,  [M)  is  superscript 

for  2]  in  net  I  are  deleted  from  the  subnet.  State  y’  is 
connected  to  y*  to  make  the  subnet  recurrent.  To  deter¬ 
mine  the  position  of  y1'1  from  y1  is  equivalent  to  have  the 
subnet  go  through  1-1  evolutions.  Similarly,  extraneous 
states  In  network  J  may  be  deleted  accordingly. 


Starting  from  state  (1,  1),  based  on  the  description  in 
Section  3,  the  outputs  of  the  BTP  neurons  are  sequen¬ 
tially  read  by  triggers  IINC,  JINC  and  PBLK  to  deter¬ 
mine  the  interconnection  between  states.  Then,  BTP  is 
renewed  to  denote  the  optimal  local  path.  In  addition, 
the  LD  neuron  is  upgraded  to  the  POD  value.  When 
the  BTP  neuron  output  is  0  or  1,  trigger  IINC  trips 
network  I  to  evolve.  When  the  BTP  output  is  -1,  trigger 
JINC  trips  network  J  to  evolve.  When  the  BTP  output 
is  0,  trigger  PBLK  trips  the  processing  module  into 
operation.  When  SIO  indicates  that  the  search  is  over, 
the  general  distance  g(I,  J)  is  compared  to  the  min¬ 
imum  general  distance  MOD  that  is  already  stored  in  a 
register.  If  it  is  less  than  the  MOD,  then  the  corre¬ 
sponding  reference  sample  index  (ID)  is  replaced  by 
the  new  word.  Once  the  sample  to  be  identified  is 
compared  with  all  the  reference  samples,  the  recogni¬ 
tion  process  is  completed.  Figure  2  shows  the  control 
flowchart  using  neural-network-guided  computation 
for  speech  recognition. 


The  optimal  normalization  function  F  can  be  denoted 
using  the  BTP  neurons.  Working  just  as  ADDR  in  the 
processing  module,  the  optimal  path  can  be  backtracked 
step  by  step.  It  is  then  linked  together  until  reaching  the 
starting  point  (1,  1). 

It  was  found  experimentally  that  the  recognition  rate  for 
isolated  words  spoken  by  an  unspecified  person  is  as  high 
as  99.0  percent.  This  is  comparable  to  that  of  any 
conventional  algorithm.4 


V.  Conclusions 

This  experiment  points  out  a  way  to  implement  a  con¬ 
ventional  DTW  technique  by  using  a  hardware  network. 
This  finding  is  of  critical  significance.  As  we  know,  a 
hard-wired  architecture  will  undoubtedly  increase  the 
computing  speed  to  reach  optimization.  As  for  the  neural 
network  presented  in  this  work,  the  LD  may  be  com¬ 
puted  in  parallel  before  optimization.  All  these  advan¬ 
tages,  on  top  of  the  apparent  superiority  of  the  parallel 
architecture  of  the  neural  network,  make  the  implemen¬ 
tation  of  DTW  more  effective. 
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[Text] 


Abstract 

This  paper  presents  an  approximate  logic  system.  Not 
only  the  logic  value  of  the  system  is  fuzzy  but  also  the 
logic  operator  can  be  fuzzy  as  well.  This  approximate 
logic  system  is  very  suitable  for  a  neural  network.  A 
neural  network  model  is  established  based  on  this  con¬ 
cept.  It  is  capable  of  learning  and  storing  knowledge.  On 
the  basis  of  this  neural  net,  a  multi-expert  opinion 
synthesizer  is  developed. 

Key  words:  approximate  logic,  and-or  degree  of  oper¬ 
ator,  expert  opinion  synthesis. 


I.  Introduction 

Neural  networks  have  received  widespread  attention 
because  of  their  ability  to  process  inaccurate  and  fuzzy 
information.  Nevertheless,  the  use  of  neural  networks  to 
process  knowledge  is  very  limited.  In  order  to  overcome 
this  weakness,  an  approximate  logic  is  defined.  This 
logic  can  properly  describe  a  neural  network.  It  not  only 
has  a  fuzzy  logic  value  but  also  a  fuzzy  logic  operator. 
Finally,  a  neural-network-based  multi-expert  opinion 
synthesis  system  is  also  described.  It  is  capable  of  storing 
rules  and  can  be  trained.  The  conventional  weight 
method  is  equivalent  to  a  special  case  where  the  neural 
net  does  not  have  any  knowledge  in  storage.  After 
receiving  opinions  from  various  experts,  it  enters  a 
pondering  period  and  then  makes  the  final  judgment. 
This  process  is  similar  to  that  in  the  human  mind  and  is 
worth  further  investigation. 

II.  Approximate  Logic  and  Neural  Network 
1.  Approximate  Logic 

An  approximate  logic  system  is  defined  as  follows.  The 
logic  value  of  the  system  is  between  [0,  1].  By  varying 
some  parameters,  a  function  may  be  switched  from  “or” 
to  “and,”  or  from  “yes”  to  “no.”  Different  from  fuzzy 
logic,  not  only  its  logic  value  is  fuzzy  but  also  its  operator 
can  be  fuzzy.  The  following  theorems  are  fairly  easy  to 
prove.  Due  to  page  limitation,  the  proof  is  omitted. 


Definition  1.  Approximate  Logic.  The  logic  value  of  the  system  is  between  [0,  1].  Aj,  A2, ...  Ak  are  the  approximate 
logic  variables,  tr4  and  wit  i  -  1,  ...,  k,  are  constants  and  weights,  respectively.  The  weight  is  between  [-1,  1]: 


r  «* 
i  *  1 


AFCAif 


AZf 


F  if  0  <F<  1  and  F>tr 
1  if  1  <F  and  F>tr 


L  0  otherwise 


It  is  easy  to  know  that  the  inverse  function  AF  is  a  multi-value  function. 

Theorem  1.  Assume  that  Aj  can  only  be  either  0  or  1. 

(1)  if  tr  =  1,  w{  =  1,  j  =  l,  k,  then  AF(A„  — ,  A*)  =  AxV  A2-,  VA* 

(2)  if  tr=k,  i  — 1»  •••»  k,  then  AF  (Aj,  — ,  A*)=AiAA,—,  A  A* 

(3)  if  wx=  —  1,  wt=  1 ,  tr  =  1,  then  AF(A„  1)  —~7A\ 

(4)  if  u>t=l,  tr= 0,  then  AF(Ai)=A2 

Therefore,  every  Boolean  function  can  be  expressed  in  terms  of  AF. 

Proof:  Omitted. 
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Definition  2.  If  the  weight  of  a  function  with  n  variables 
belongs  to  [0,  1]  and  its  threshold  is  0  <  tr  <  n,  then  the 
“and/or”  operator  is  defined  as  follows: 

aod(AF)  =  tr/C5~!w,) 

i  - 1 


When  aod  changes  from  0  to  1,  the  function 

V"-iA<  becomes  A7-i Af. 


2.  Neural  Network  Model 

Our  neural  network  is  comprised  of  four  parts:  a  weight  set, 
a  nodal  point  set,  a  threshold  set  and  an  output  set.  Wy  e  [-1, 
1]  is  the  weight  from  node  i  to  j.  Let  us  assume  that  wy  e  [0, 
1].  S|  is  the  excitation  value  of  node  i  and  out  is  the  output. 
Then,  the  system  operates  according  to  the  following: 

*,(t  +  I)=SM'</  *  outt(.0+bt  <0 

/-i 

(1) 


Through  the  use  of  aod,  a  fuzzy  operation  can  be 

discretized  into  corresponding  Boolean  operations.  where  b,(t)  is  a  coefficient  in  [0,  1]. 


"s,  (O  if  s,(0>tr,  and 
out/( 0=  max  if  s,(t)> max  and 
.min  otherwise 


In  general,  max  -  1,  min  -  0,  or  max  -  1,  min  -  -1.  The 
net  input  of  the  system  is  defined  as: 


S/Ct  +  D^net/CO+W//  *  (A) 


Since  all  propositions  are  self-supporting  w„  e  [0,  1]. 

Definition  3.  The  Boolean  system  is  a  binary  system  and 
its  output  is  either  0  or  1.  In  this  case,  max  -  1,  min  -  0. 

Theorem  2.  If  a  system  has  max  -  1  and  min  -  0,  and 
each  weight  belongs  to  [0,  1],  then  each  node  is  equiva¬ 
lent  to  an  approximate  function. 

Proof:  Omitted. 

Deduction  1.  Every  Boolean  function  can  be  imple¬ 
mented  using  a  Boolean  system. 

Proof:  Omitted. 

Theorem  3.  If  net,(t)  -  p,  where  p  is  a  constant,  and  min 
<  p  <  max,  then  if  p  +  w„  *out|(t)  <  tri(  then  out,(t+k) 
-  min,  k  -  1,  2,...;  if  p  +  w„  *out,(t)  >  tr,,  w„  -  1  and  p 
>  0,  then  k  >  (max  -  tr,)/p  and  out,(t+k)  -  max.  If  w„  - 
1  and  p  <  0,  then  k  >  (max  -  tr,)/p  and  out,(t+k)  -  min. 
If  p  +  w„  *out,(t)  >  tr,,  and  0  <  w„  <  1  and  p/(l  -  w„)  > 
tr,,  then  out,(t+k)  — •  p/(  1  -  w„).  When  k  — .  +  oo  ,  if  p  +  w„ 
*out,(t)  >  tr,,  0  <  w„  <  1  and  p  /( 1  -  w„)  <  tr,,  then 
out,(t+k)  — >  min  when  k  — .  +  oo. 

Proof:  Omitted. 

Conclusion.  If  net,  is  a  constant,  then  the  final  out,  is 
stable. 


min<s(  (f)<max 

Si(.t)>tr,  (2) 

Definition  4.  A  neural  network  can  be  viewed  as  a  directed 
graph,  called  a  neural  net  graph.  Each  node  is  a  node  for 
the  graph.  Each  non-zero  weight  is  the  edge  of  the  graph. 

Theorem  4.  In  a  neural  network,  if  all  the  input  ext,  and 
their  weights  are  constants,  and  if  the  neural  network  graph 
has  no  loop,  then  the  system  will  oscillate  in  operation. 

Proof:  Omitted. 

Definition  5.  The  deduction  process  of  the  neural  net¬ 
work  is  as  follows:  A  node  that  does  not  receive  infor¬ 
mation  from  other  nodes  is  a  conditional  node.  A 
conditional  node  is  a  premise  for  the  system  to  accept  a 
deduction.  When  the  system  operates  for  some  time,  if 
other  nodes  have  been  stabilized,  then  the  state  of  these 
nodes  is  used  to  derive  a  conclusion  under  the  premise. 
A  looped  neural  network  is  usually  a  chaotic  system  and 
its  qualitative  stability  analysis  is  very  difficult  to  per¬ 
form.  It  usually  oscillates  because  there  are  often  contra¬ 
dictory  bits  of  information. 

Definition  6.  Linear  Stability  Condition.  If  O  -  (o„ 
o2,...ok)  represents  the  steady  states  of  the  neural  net¬ 
work  and  min  <  o,  <  max  for  all  i,  then  O  is  known  as 
linear  stability. 

Theorem  5.  Let  us  assume  that  in  a  system  the  coefficient 
b(t)  of  equation  (1)  is  zero,  then  the  condition  for  linear 
stability  is: 


Proof:  Omitted. 
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III.  Neural-Network-Based  Multi-Expert  Opinion 


Figure  1.  Layered  Neural  Network 


Synthesis  System 

1.  Purpose  of  Use 

Expert  systems  are  being  developed  from  a  simple  data¬ 
bank  of  a  single  expert  to  a  complex  system  involving 
multiple  experts.  In  a  distributed  multi-expert  system, 
the  key  part  is  a  subsystem  that  synthesizes  the  opinions 
of  the  experts.  Conventional  methods,  including 
weighted  method,  voting  method  and  statistical  method, 
etc.  have  the  following  major  disadvantages: 

(1)  There  is  no  mechanism  to  incorporate  experience 
into  the  management  of  synthesis  of  expert  opinions. 

(2)  It  is  not  possible  to  provide  training. 

(3)  It  is  impossible  to  reflect  the  situation  wherein  the 
experts’  opinions  cannot  be  mediated. 

Since  the  ability  to  synthesize  experts’  opinions  is 
directly  related  to  human  intelligence,  it  is  not  possible 
to  create  an  ideal  model  for  the  synthesis  of  experts’ 
opinions.  We  believe  that  if  a  model  can  be  created,  the 
model  ought  to  have  a  number  of  tunable  parameters  to 
meet  the  needs  in  different  disciplines.  Furthermore,  it 
ought  to  be  able  to  overcome  the  three  major  disadvan¬ 
tages  described  above,  at  least  to  some  extent.  This 
model  will  be  helpful  to  our  effort  to  improve  the 
capability  of  the  system  to  synthesize  experts’  opinions. 
As  a  result  of  our  investigation,  it  was  discovered  that  the 
competition  and  coordination  mechanism  of  a  neural 
network  can  perform  these  functions  well. 

2.  System  Architecture 

A  state  cell  is  used  to  represent  an  expert’s  judgment.  A 
state  cell  is  connected  to  a  judgment.  The  value  of  a  state 
is  between  [0, 1  ].  Each  cell  in  the  system  has  an  input  and 
extj.  Various  opinions  of  the  experts  are  sent  into  the 
system  through  these  input  ends,  as  shown  in  Figure  2.  In 
equation  (3),  b(t)  can  be  considered  as  the  weight  for  the 
expert.  Prior  to  operation,  common  knowledge  of  the 
experts  is  stored  in  the  system.  There  are  two  ways  to 
gather  experts’  opinions.  One  is  a  parallel  listening 
method.  The  method  collects  all  the  opinions  of  the 
experts  and  provides  the  result  to  the  system  after 
applying  a  weight.  This  method  is  very  similar  to  the 
conventional  methods.  Its  advantage  is  simplicity.  How¬ 
ever,  its  shortcoming  is  that  it  cannot  reflect  disagree¬ 
ment  among  experts  in  detail.  It  is  only  able  to  reflect  the 


situation  wherein  the  collective  opinion  of  the  experts  is 
consistent  with  the  knowledge  stored  in  the  network.  The 
other  way  is  a  serial  method  which  listens  to  experts’ 
opinions  one  at  a  time.  After  listening  to  the  opinion  of 
each  expert,  the  neural  network  is  allowed  to  complete 
several  cycles  of  processing  before  listening  to  the  next 
expert.  The  opinions  of  certain  experts  can  also  be 
listened  to  repeatedly.  This  process  is  very  similar  to  that 
used  by  a  human.  In  this  serial  mode,  if  wu  -  1,  Wy  -  0, 
i  not  equal  to  j,  and  t^  =  min,  the  results  provided  by  the 
system  are  the  weighted  values.  The  parameters  of  the 
system  must  be  tuned  in  practice.  The  neural  network  is 
usually  unstable.  In  other  words,  it  is  impossible  to 
inquire  about  the  result  after  the  system  has  calmed 
down  completely. 

In  order  to  solve  this  problem,  a  waiting  period  number 
T  and  an  observation  period  number  L  are  specified  for 
the  system.  After  the  system  listens  to  the  opinion  of  the 
last  expert,  it  is  allowed  to  operate  for  T  cycles.  This  is 
equivalent  to  allowing  the  system  to  think  based  on  the 
knowledge  it  stores.  Then,  the  system  is  observed  over  L 
cycles.  During  this  period,  the  state  of  every  cell  is 
analyzed.  The  synthesis  result  is  obtained  by  finding  the 
mean  value  of  these  observed  values.  It  is  easy  to 
determine  whether  a  cell  is  oscillating.  As  long  as  the 
observation  period  is  equal  to  the  oscillation  period,  a 
mean  value  of  (max  +  min)/2  indicates  that  no  judgment 
can  be  made. 


Figure  2.  Multi-Expert  Opinion  Synthesis  System 


3.  Results  of  Operation 

Definition  7.  If  Sg-  (s„  s2 . sk)  represents  the  result  and 

Zg  -  (zj,  z2,  ...,  zk)  is  the  experts’  opinion,  the  degree  of 
acceptance  of  experts’  opinion  is  defined  as  follows: 

grr=T^(Zj— Sj)2 
i- i 

Based  on  simulation,  the  following  interesting  phe¬ 
nomena  have  been  discovered: 

(1)  Self-contradictory  opinions  produce  a  no  decision  or 
oscillation. 

If  contradictory  knowledge  A  — *  ,  ....  — *  >  A  is  stored 
into  the  system,  the  system  oscillates. 

(2)  Winning  with  masses. 


28 


When  a  number  of  experts  participate  in  a  debate,  if  the 
opinion  of  the  majority  is  consistent,  the  output  of  the 
system  is  in  agreement  with  the  majority  opinion.  Of 
course,  the  majority  opinion  must  be  consistent  with  the 
knowledge  learned  by  the  system.  In  Example  1,  w01  - 
0.9,  wu  -  -0.8,  W32  -  0.8  and  w3i  -  0.9.  Other  weights  are 
0  and  all  thresholds  are  -3. 

From  Tables  1  and  2,  we  can  see  that  the  opinion  of 
expert  Z0  is  different  from  that  of  others.  The  final 
operating  result  is  also  different. 

(3)  Winning  with  few. 
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Since  the  system  “ponders”  for  several  cycles  after 
listening  to  the  opinions  of  all  experts,  during  this 
period,  the  system  forgets  the  opinions  of  certain  experts 
that  are  incompatible  with  the  knowledge  stored  in  it. 
Therefore,  the  system  sometimes  reaches  the  conclusion 
that  is  consistent  with  the  minority,  as  in  Example  2. 

In  this  system,  w„,  -  0.9,  w12  -  -0.8,  w32  -  0.8  and  w31  - 
0.9.  Other  weights  are  0  and  all  thresholds  are  -3. 

From  Tables  3  and  4  we  can  see  that  the  viewpoint  of 
expert  Z2  is  different  from  that  of  others,  but  is  consis¬ 
tent  with  the  knowledge  stored  in  the  system.  The  final 
result  is  in  agreement  with  his  opinion. 


JPRS-CST-93-004 
3  March  1993 


29 


4.  Learning  From  Samples 

As  described  earlier,  the  system  architecture  includes  the 
interconnection  among  various  determinations.  These 
interconnections  need  to  be  corrected  through  a  learning 
process  in  operation.  The  learning  process  was  com¬ 
pleted  by  adopting  the  Hebb  or  Delt  rule.  If  the  opinion 
of  a  certain  expert  is  extremely  important,  it  is  repeat¬ 
edly  used  to  stimulate  the  system  until  the  system 
accepts  his  opinion. 

Assuming  X  -  (x„  x2,...xk)  is  the  opinion  of  that  expert, 
then  the  weight  from  the  ith  to  the  jth  cell  becomes: 

Hebb  rule: 

middle  =  (max +min)/2j 
+  (*, -middle)  *  (*, -middle) j 

a  and  y  belong  [0,  1] . 

Delt  rule: 

out,=AFj (*M  •••»  **)j 

+  (Xt-out,)  *outlf 

W’li(t  +  l)  =  a  +  AWli(.t  +  l'> 

a  and  y  belong  to  [0,  1],  Through  the  use  of  an  auxiliary 
node  Rn,  threshold  learning  can  be  changed  to  weight 
learning.  In  this  work,  the  threshold  trt  -  middle  and 
outRn  -  1.  When  it  is  necessary  to  learn  the  threshold,  the 
following  rule  may  apply:  Assuming  AFj  is  the  approxi¬ 
mate  logic  function  of  node  i,  then 

if  AF,  >  min  and  x(  -  min,  then  trnew  t  [i  is  subscript  for 
new]  -  AFi  (x„  x2, ....  xk)  +  y  *  max; 

if  AF,  -  min  and  x,  >  min,  then  trnew ,  [i  is  subscript  for 
new]  -  x,. 

References 

1.  D.  E.  Rumelhart,  J.  L.  McClelland,  PDP  Research 
Group,  "Parallel  Distributed  Processing,”  Volume  1: 
Foundations. 

2.  Lu  Ru  Qian,  “Distributed  Knowledge-Base  System,” 
in  COMPUTER  RESEARCH  AND  DEVELOPMENT, 
Supplement,  1988. 

3.  Shi  Zhongzhi,  “Neural  Computing,”  Graduate 
School,  University  of  Science  and  Technology  of  China, 
1991. 

Hu  Hong  graduated  from  the  Department  of  Computer 
Science  at  Zhejiang  University  in  1984.  From  1984  to 
1987,  he  was  involved  in  computer  hardware  design  at 
the  CAS  Institute  of  Computing  Technology.  Since  1987, 
he  has  been  working  on  artificial  intelligence. 


Shi  Zhongzhi  was  born  in  Jiaxing,  Jiangsu  in  1941.  He  is 
a  Research  Fellow  at  the  National  Research  Center  for 
Intelligent  Computing  Systems,  CAS  Institute  of  Com¬ 
puting  Technology,  part-time  professor  at  the  graduate 
school  of  USTC  and  Dalian  Institute  of  Technology, 
chairman  of  IFIP  automatic  inference  group,  vice 
chairman  of  the  China  Artificial  Intelligence  Society, 
editor  of  journals  such  as  MOSHI  SHIBIE  YU  REN- 
GONG  ZHINENG  [PATTERN  RECOGNITION  AND 
ARTIFICIAL  INTELLIGENCE],  He  has  been  studying 
artificial  intelligence,  neural  networks  and  databases  for 
long  periods  of  time.  He  has  authored  six  books  and 
published  over  1 50  papers. 

General-Purpose  Parallel  Neural  Network 
Simulation  System  GP2N2S2 

Main  Details 

93P60120A  Shenyang  XIAOXING  WE1XING 
JISUANJI XITONG  [MINI-MICRO  SYSTEMS] 
in  Chinese  Vol  13  No  12,  Dec  92  pp  16-21,  32 

[Article  by  Chen  Guoliang  [7115  0948  5328],  Xiong  Yan 
[3574  3543],  and  Fang  Xiang  [2455  4382]  of  the  Dept,  of 
Computer  Science,  University  of  Science  &  Technology 
of  China  (USTC),  Hefei  230027:  “General-Purpose  Par¬ 
allel  Neural  Network  Simulation  System  GP2N2S2,” 
supported  by  grants  from  the  State  S&T  Commission’s 
Basic  Research  and  High  Technology  Department  and 
the  863  Plan;  MS  received  21  Jun  92.  Cf.  brief  early 
report  in  JPRS-CST-92-010,  22  May  92  p  27] 

[Abstract]  A  Transputer-array-based  General-Purpose 
Parallel  Neural  Network  Simulation  System  (GP2N2S2) 
developed  at  USTC  is  described.  This  system  provides 
the  neural  network  simulation  language,  editor,  com¬ 
piler,  and  user’s  executive  program  environment.  The 
user’s  program,  written  in  sequential  code,  can  be  auto¬ 
matically  implemented  in  parallel  on  a  parallel  Trans¬ 
puter  system.  GP2N2S2  can  simulate  not  only  several 
currently  popular  neural  networks,  but  also  new  neural 
network  models  independently  developed  by  the  user. 

The  GP2N2S2  hardware  configuration  and  system  soft¬ 
ware  module  schematic  are  shown  below  in  Figures  1 
and  3,  respectively.  In  Figure  3,  PEM  is  the  program 
editing  module,  PCM  is  the  program  compiling  module, 
CSM  is  the  control  simulation  module,  GDM  is  the 
graphics  display  module,  ARCM  is  the  array  reconfigu¬ 
ration  module,  ALM  is  the  array  loading  module,  SCM  is 
the  simulation  command  module,  NAM  is  the  neuron 
allocation  module,  SIDSM  is  the  sampling  input  and 
data  sending  module,  SRM  is  the  simulation  results 
module,  PCRTCM  is  the  personal  computer  host  to  root 
Transputer  communications  module,  NC  is  the  network 
communications  subprocess,  SC  is  the  simulation  com¬ 
putation  subprocess,  and  DA  is  the  data  access  subpro¬ 
cess.  Figure  2,  not  reproduced,  shows  the  system  soft¬ 
ware  structure.  Figures  4  and  5,  reproduced  below,  show 
the  usual  Transputer  array  configuration  and  the 
GP2N2S2  Transputer  array  configuration,  respectively. 
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Figure  1.  GP2N2S2  Hardware  Configuration 


In  the  usual  configuration,  the  IBM  PC  386  host  is 
connected  to  a  T800  host  Transputer,  which  is  connected 


in  turn  to  other  T800  Transputers  including  the  root 
Transputer;  in  the  GP2N2S2  configuration,  however,  the 
host  is  connected  directly  to  the  T800  root  Transputer, 
which  in  turn  is  connected  to  the  three  other  T800 
Transputers  in  the  four-Transputer  array. 

GP2N2S2  comes  with  a  Neural  Network  Description 
Language  (NNDL)  and  Occam  2,  can  simulate  12,000 
neurons,  permits  over  300,000  connection  weights,  and 
has  a  processing  speed  of  80,000  IPS  (interconnection 
weight  updates  per  second);  it  can  simulate  a  10-city  TSP 
(Traveling  Salesman  Problem)  in  12  seconds. 


PC  server 
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Figure  3.  System  Software  Module  Schematic 
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Figure  4.  Usual  Transputer  Array  Configuration 
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[Abstract]  Advanced  (i.e.,  high-level)  Neural  Network 
Description  Language  (NNDL)  is  a  non-procedural  lan¬ 
guage-one  therefore  quite  different  from  traditional 
(procedural)  program  design  languages  such  as  Fortran 
and  C— provided  by  GP2N2S2  and  specially  designed  for 
writing  neural  network  simulation  programs  at  the  net¬ 
work,  layer,  and  node  levels.  Several  distinctive  features 
of  NNDL  and  its  editor/compiler  are  described. 

Although  the  article  has  no  numbered  figures  or  tables  as 
such,  a  programming  example  is  provided  in  which  the 
BP  (backpropagation)  algorithm  is  used  to  solve  a  mir¬ 
ror-image  symmetry  problem.  This  problem  is  described 
as  follows:  if  the  input  binary  sequence  is  center- 
symmetric,  then  the  output  is  a  1 ,  otherwise  it  is  0.  For 
instance,  if  the  input  is  01 1 1 10,  the  output  is  1,  but  if  the 
input  is  011000,  the  output  is  0.  Using  6-bit  binary 
sequences,  the  mirror-image  symmetry  problem  is  coded 
with  the  following  program: 

In  the  diagram,  the  first  six  nodes  in  the  input  layer  are 
the  input  binary  sequence,  while  the  final  two  nodes  are 
the  virtual  nodes,  representing  the  input-layer  threshold 
and  intermediate-layer  threshold,  respectively;  both  are 
fixed  as  an  input  of -1. 
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The  network  topological  structure  la 
shown  in  the  following  diagram: 
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Parallel  Simulation  Controller 

93P60120C  Shenyang  XIAOXING  WEIXING 
JISUANJI XITONG  [MINI-MICRO  SYSTEMS] 
in  Chinese  Vol  13  No  12,  Dec  92  pp  29-32 

[Article  by  Huang  Junbin  [7806  0193  2430]  and  Chen 
Guoliang  of  the  Dept,  of  Computer  Science,  USTC, 
Hefei  230027:  “Design,  Implementation  of  GP2N2S2 
Parallel  Simulation  Controller,”  supported  by  grants 
from  the  State  S&T  Commission’s  Basic  Research  and 
High  Technology  Dept,  and  the  863  Plan;  MS  received 
21  Jun  92] 

[Abstract]  GP2N2S2’s  core  software — the  parallel  simu¬ 
lation  controller  (PSC) — is  introduced,  and  its  design 
concept  and  implementation  are  described.  The  PSC 
includes  three  parts:  the  NC  (sub)process,  the  DA  (sub- 
)process,  and  the  SC  (sub)process,  all  three  executed  in 
parallel  within  the  Transputer  array. 

One  figure  (not  reproduced)  shows  the  Transputer  array. 
References 
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“Explorations  in  Parallel  Distributed  Processing,”  1988. 

2.  GP2N2S2  Overall  Design  Report,  USTC  Computer 
Dept.,  1992.2. 

3.  “Large  General-Purpose  Neural  Network  Simulation 
System,”  USTND,  1991.4. 

4.  GP2N2S2  Programmer’s  Editing  Guide,  USTC  Com¬ 
puter  Dept.,  1992.3. 


Central  Control  Block 

93P60120D  Shenyang  XIAOXING  WEIXING 
JISUANJI  XITONG  [MINI-MICRO  SYSTEMS] 
in  Chinese  Vol  13  No  12,  Dec  92  pp  33-36 

[Article  by  Zhang  Qingjun  [1728  1987  6511]  and  Chen 
Guoliang  of  the  Dept,  of  Computer  Science,  USTC, 
Hefei  230027:  “Design,  Implementation  of  GP2N2S2 
Central  Control  Block,”  supported  by  grants  from  the 
State  S&T  Commission’s  Basic  Research  and  High  Tech¬ 
nology  Dept,  and  the  863  Plan;  MS  received  21  Jun  92] 

[Abstract]  The  Central  Control  Block  (CCB)  is  the  center 
of  both  the  data  stream  and  control  stream  of  GP2N2S2, 
which  is  a  MIMD  [multiple  instruction  stream/multiple 
data  stream]  system.  The  CCB  handles  system  data 
collection,  processing,  sending,  tasks  assignment,  etc. 
The  implementation  of  the  CCB  is  detailed,  its  task 
assignment  strategy  is  discussed,  and  the  system  data¬ 
gram  protocol  related  to  the  CCB  implementation  is 
described. 

The  GP2N2S2  system  software  consists  of  five  large 
modules:  the  compiler  module  (NNC),  the  CCB,  the 
PSC,  the  algorithm  library  (LIB),  and  the  dynamic 
graphics  simulator  (part  of  the  System  Integrated  Envi¬ 
ronment,  or  SIE),  as  shown  in  Figure  1  below.  The  CCB 
itself  consists  of  three  parts,  labeled  CCB1,  CCB2,  and 
CCB3  in  the  figure.  Table  1  (not  reproduced)  lists  NNC 
output  data  and  corresponding  NIP  (Network  Informa¬ 
tion  Package)  data  and  NSP  (Network  Structure 
Package)  data,  while  Table  2  (not  reproduced)  lists  NTP 
(Network  State  Package)  and  DIP  (Dynamic  Informa¬ 
tion  Package)  data. 
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1.  GP2N2S2  Development  Report,  USTC  Computer  Dept.,  1992.1. 

2.  GP2N2S2  Technical  Report,  USTC  Computer  Dept.,  1992.3. 
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Algorithmic  Library,  Applications 

93P60120E  Shenyang  XIAOXING  WEIXING 
JISUANJI XITONG  [MINI-MICRO  SYSTEMS] 
in  Chinese  Vol  13  No  12,  Dec  92  pp  37-43,  48 


[Article  by  Zhang  Qun  [1728  5028]  and  Chen  Guoliang 
of  the  Dept,  of  Computer  Science,  USTC,  Hefei  230027: 
“Establishment  of  the  GP2N2S2  Algorithmic  Library,  Its 
Applications,”  supported  by  grants  from  State  S&T 
Commission’s  Basic  Research  and  High  Technology 
Dept,  and  the  863  Plan;  MS  received  1  Jun  92] 

[Abstract]  The  GP2N2S2  algorithm  library  (LIB)  includes 
several  popular  neural  network  algorithms,  such  as  BP 
(Backpropagation),  Kohonen,  and  Hopfield.  These  have 
been  employed  to  solve  some  real  problems  such  as  the 
TSP,  Associative  Memory  (AM)  and  mirror  symmetry. 
Also,  the  dynamic  code  loading  method  has  been  used  to 
implement  simulation  calling  for  user-defined  algo¬ 
rithms. 

Ten  figures  (not  reproduced)  show  a  BP  network,  a 
single-node  calculation  flow  chart  for  a  BP  algorithm,  a 
mirror-symmetry  network  structure,  Rumelhart’s  simu¬ 
lation  results,  USTC’s  simulation  results,  a  Kohonen 
network,  a  single-node  calculation  flow  chart  for  the 
Kohonen  algorithm,  a  Kohonen  network  used  to  solve 
the  TSP,  standard  samples  of  numerals  for  NN  recogni¬ 
tion,  and  the  simulation  results  for  noisy  numeral  input 
samples. 


References 
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Applications,  USTC  Publishing  House,  1992,  pp  121- 
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2.  Richard  P.  Lippmann,  “An  Introduction  to  Com¬ 
puting  With  Neural  Nets,”  IEEE  ASSP  MAGAZINE, 
Vol  3,  No  4,  pp  4-22,  April  1987. 

3.  Wang  Yang,  Guang  Wei,  Zhuang  Lingzhu,  “Parallel 
Processor  Transputers  and  the  Occam  Language, 
Marine  Simulator  Joint  Company,  Sep.  1988. 

4.  David  E.  Rumelhart,  Geoffrey  E.  Hinton,  and  Ronald 
J.  Williams,  “Learning  Representations  by  Back- 
Propagating  Errors, "  NATURE,  Vol  323,  9  Oct.  1986. 

5.  Jiao  Licheng,  “Theoiy  of  Neural  Network  Systems,” 
Xidian  University  Publishing  House,  Dec.  1990. 

Integrated  Environment,  Dynamic  Graphical 
Simulator 

93P60120F  Shenyang  XIAOXING  WEIXING 
JISUANJI  XITONG  [MINI-MICRO  SYSTEMS] 
in  Chinese  Vol  13  No  12,  Dec  92  pp  44-48 

[Article  by  Qin  Xiaoou  [4440  1420  7743]  and  Chen 
Guoliang  of  the  Dept,  of  Computer  Science,  USTC, 
Hefei  230027:  “Design,  Characteristics  of  GP2N  S 
Integrated  Environment,  Dynamic  Graphical  Simula¬ 
tor,”  supported  by  grants  from  the  State  S&T  Commis¬ 
sion’s  Basic  Research  and  High  Technology  Dept,  and 
the  863  Plan;  MS  received  1  Jun  92] 

[Abstract]  The  GP2N2S2  system  integrated  environment 
(SIE)  provides  users  with  a  common  interface  supporting 
editing,  compiling,  and  running.  The  dynamic  graphical 
simulator  which  realizes  the  dynamic  procedure  simula¬ 
tion  of  artificial  NNs  is  driven  by  and  belongs  to  this 
environment.  The  design  and  features  of  the  two  parts 
are  discussed. 

Two  figures,  both  reproduced  below,  show  the  architec¬ 
ture  of  the  SIE  and  the  structure  of  the  dynamic  simula¬ 
tion  controller. 
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Figure  1.  Architecture  of  SIE 


Figure  2.  Structure  of  Dynamic  Simulation  Controller 
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On  Design,  Analysis  of  MPLPC  [Multi- 
Pulse-Excitation  Linear  Predictive  Coding]  Vector 
Quantizer  Based  on  Artificial  Neural  Network 

40100052A  Beijing  TONGXIN  XUEBAO  [JOURNAL 
OF  CHINA  INSTITUTE  OF  COMMUNICATIONS] 
in  Chinese  Vol  13  No  5,  Sep  92  pp  3-10 

[English  abstract  of  article  by  Xu  Bingzheng  and  Peng 
Lei  of  South  China  University  of  Technology,  Guang¬ 
zhou;  MS  received  18  Oct  91] 

[Text]  The  application  of  neural  networks  in  speech 
compression  encoding  is  discussed.  A  vector  quantizer 
based  on  an  artificial  neural  network  is  provided  which 
can  be  used  in  the  quantization  of  multi-pulse  excitation 


analysis  model  of  the  speech  signal.  The  quantizing 
network  is  somewhat  similar  to  Kohonen’s  net.  It  per¬ 
forms  the  parameter  analysis  and  quantizing  process 
together.  Compared  to  the  traditional  method  which 
performs  analysis  first  and  then  quantization,  it  has 
some  excellent  properties.  We  provide  the  architecture 
and  learning  rule  of  the  quantizing  network,  and  com¬ 
pare  it  with  the  traditional  method  in  the  implementa¬ 
tion.  Finally,  we  simulate  the  quantizing  network  with 
practical  speech  signal.  The  experimental  results  show 
that  our  scheme  is  feasible. 

Modified  Kohonen  Self-Organizing  Neural 
Network,  Adaptive  Vector  Quantization  of  Images 

40100052B  Beijing  TONGXIN  XUEBAO  [JOURNAL 
OF  CHINA  INSTITUTE  OF  COMMUNICATIONS] 
in  Chinese  Vol  13  No  5,  Sep  92  pp  16-21 

[English  abstract  of  article  by  Wang  Wei,  Cai  Dejun,  and 
Wan  Faguan  of  Huazhong  University  of  Science  and 
Technology,  Wuhan;  MS  received  14  Oct  91] 

[Text]  Based  on  discussion  of  the  principle  of  Kohonen’s 
self-organizing  feature  maps  (SOFM),  a  modified  SOFM 
(MSOFM)  algorithm  is  proposed  to  reduce  blocking 
effect  of  vector  quantization  of  images.  Two  eigenvalues 
are  designed  in  DCT  (Discrete  Cosine  Transform) 
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domain  to  classify  image  blocks,  then  the  application  of 
MSOFM  algorithm  in  adaptive  vector  quantization  is 
discussed.  The  results  of  computer  simulation  show  that 
the  MSOFM  training  algorithm  significantly  reduces 
blocking  effect  and  has  better  performance  than  the 
SOFM  algorithm. 

New  Models  of  Holographic  Network,  Hamming 
Net,  Their  Application  in  Handwritten 
Chinese-Character  Recognition 

40100052C  Beijing  TONGXIN  XUEBAO  [JOURNAL 
OF  CHINA  INSTITUTE  OF  COMMUNICATIONS] 
in  Chinese  Vol  13  No  5,  Sep  92  pp  54-59 

[English  abstract  of  article  by  Yu  Yinglin  and  Deng  Da 
of  South  China  University  of  Technology,  Guangzhou; 
MS  received  16  Mar  92] 

[Text]  We  propose  two  new  methods,  one  for  holo¬ 
graphic  memory,  another  of  a  fast-converging  Hamming 
net,  both  with  an  automatic  attention  moving  function. 
Satisfactory  results  have  been  obtained  after  the  con¬ 
struction  of  a  handwritten  Chinese-character  recognition 
system  using  these  new  models. 

Applications  of  Neural  Network  for  Handwritten 
Digit  Recognition 

40 10005 2D  Beijing  TONGXIN  XUEBAO  [JOURNAL 
OF  CHINA  INSTITUTE  OF  COMMUNICATIONS] 
in  Chinese  Vol  13  No  5,  Sep  92  pp  60-64 

[English  abstract  of  article  by  Wang  Minghui,  Pan 
Xinan,  and  Shen  Min  of  Beijing  University  of  Posts  and 
Telecommunications;  MS  received  2  Mar  92] 
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[Text]  A  feedforward  multilayer  neural  network  with 
backpropagation  learning  algorithm  to  recognize  hand¬ 
written  digits  written  by  40  persons  is  used.  First,  a  HP 
scanner  converts  the  original  digit  images  to  binary 
images.  Then,  some  additional  preprocessing  is  per¬ 
formed  to  segment  and  normalize  the  digits,  the  binary 
images  are  scaled  to  a  32  x  32  pixel  matrix,  and  the 
features  are  extracted  to  change  the  representation  of  a 
digit  from  a  pixel  matrix  to  a  feature  description.  Finally, 
a  result  of  0.4  percent  error  rate  at  25  percent  reject  rate 
in  the  computer  simulation  is  achieved.  Some  problems 
encountered  in  the  experiment  are  also  discussed. 

Learning  Algorithm  for  Speech  Recognition  With 
Recurrent  Neural  Network 

40100052E  Beijing  TONGXIN  XUEBAO  [JOURNAL 
OF  CHINA  INSTITUTE  OF  COMMUNICATIONS] 
in  Chinese  Vol  13  No  5,  Sep  92  pp  76-79 

[English  abstract  of  article  by  Li  Haizhou  and  Xu  Bing- 
zheng  of  South  China  University  of  Technology,  Guang¬ 
zhou;  MS  received  2  Nov  91] 

[Text]  Learning  to  associate  static  input/output  pairs  can 
be  accomplished  with  layered  connectionist  networks 
with  feedforward  links  alone,  but  feedback  links  are 
required  to  provide  the  network  with  state  sequence 
information,  in  order  to  capture  sequential  behavior.  In 
this  paper,  a  multilayer  network  architecture  with 
dynamic  neurons  which  have  multiple  local  feedbacks  is 
built.  The  network  proposed  can  be  trained  to  memorize 
sequential  patterns.  A  new  learning  algorithm  is  also 
derived  which  is  more  effective  and  easier  to  implement. 
Finally,  some  experiments  on  speech  recognition  of 
Chinese  numbers  are  designed  to  explore  the  capabilities 
of  proposed  networks  to  learn  dynamic  properties  of 
time-varying  data.  The  performance  of  dynamic  neurons 
with  different  time  delay  periods  is  also  shown. 
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